You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Sutou Kouhei <ko...@clear-code.com> on 2021/04/20 01:10:15 UTC

Re: Please Review: Application for a Media Type

Hi,

Sorry for not responding this...

Weston, thanks for writing up the draft!
https://docs.google.com/document/d/1PmZFoSifV_TX4vXnv775WiOtqCgz5zLF5ryFRWio3HQ/edit?usp=sharing

Here are items we need to discuss before we apply a media
type to IANA:

1. Interoperability Considerations

Draft:

> The Apache arrow format is intended to be a language
> independent columnar memory format for flat and
> hierarchical data.  It has been shown to work in a variety
> of languages and applications.  Arrow files can be
> provided in two different formats, a streaming format
> (vnd.apache.arrow.stream) and a random access format
> (vnd.apache.arrow.file).  Applications should be aware of
> which format they are processing as the two are not
> interchangeable.

Note in draft:

> Should we mention something like "applications should
> make sure to check the 'version' field to ensure they
> can process the file"?

How about referring our format document for further
information instead of mention the 'version' field?
https://arrow.apache.org/docs/format/Columnar.html

XML Media Types also refers the XML specification for
further information:

https://tools.ietf.org/html/rfc7303#section-9.1

> For further information, see Section 2.9 "Standalone
> Document Declaration" and Section 5 "Conformance" of [XML].


2. File extension(s)

Draft:

> N/A

Note in draft:

> Again, there are no formal extensions that have been
> recommended before.  Do we want to introduce any?  I'm
> pretty sure this is in no way binding (and it's unlikely
> anyone will ever see it).

I want recommended extensions to avoid spreading various
extensions for Apache Arrow formats.

How about the followings?

  * vnd.apache.arrow.file: .arrow
  * vnd.apache.arrow.stream: NA
    (Generally, this format isn't saved as file. This format
    is used for pipe, sending/receiving via socket and so on.)

FYI: Here is a list that shows used extensions in our code
base.

Our integration test uses the following extensions:

  * vnd.apache.arrow.file: .arrow_file
  * vnd.apache.arrow.stream: .stream

https://github.com/apache/arrow/blob/master/dev/archery/archery/integration/runner.py#L250-L257

    log('-- Validating file')
    producer_file_path = os.path.join(
        gold_dir, "generated_" + test_case.name + ".arrow_file")
    consumer.validate(json_path, producer_file_path)

    log('-- Validating stream')
    consumer_stream_path = os.path.join(
        gold_dir, "generated_" + test_case.name + ".stream")

Our C++ tests use the following extensions:

  * vnd.apache.arrow.file: Not used (in-memory buffer is used)
  * vnd.apache.arrow.stream: Not used (in-memory buffer is used)

Our C++ examples use the following extensions:

  * vnd.apache.arrow.file: .arrow
  * vnd.apache.arrow.stream: NA

https://github.com/apache/arrow/blob/master/cpp/examples/minimal_build/example.cc#L34

    const char* arrow_filename = "test.arrow";

Our Python documentation uses the following extensions:

  * vnd.apache.arrow.file: .arrow
  * vnd.apache.arrow.stream: Not used (in-memory buffer is used)

https://github.com/apache/arrow/blob/master/docs/source/python/filesystems.rst

   with local.open_output_stream("test.arrow") as file:

Our Go tests use the following extensions:

  * vnd.apache.arrow.file: Not used (no extension)
  * vnd.apache.arrow.stream: Not used (no extension)

Our Java tests use the following extensions:

  * vnd.apache.arrow.file: .arrow
  * vnd.apache.arrow.stream: .arrow but most of tests use in-memory buffer

https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/ipc/TestArrowFile.java#L51

    File file = new File("target/mytest_write.arrow");

https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/ipc/TestRoundTrip.java#L176

    final File temp = File.createTempFile("arrow-test-" + name + "-", ".arrow");

Our JavaScript tests use the following extensions:

  * vnd.apache.arrow.file: Not used (in-memory buffer is used)
  * vnd.apache.arrow.stream: Not used (in-memory buffer is used)

Our Julia tests use the following extensions:

  * vnd.apache.arrow.file: Not used (in-memory buffer is used)
  * vnd.apache.arrow.stream: Not used (in-memory buffer is used)

Our Rust tests use the following extensions:

  * vnd.apache.arrow.file: .arrow_file
  * vnd.apache.arrow.stream: .stream

Note that they use data in our integration test.


Thanks,
--
kou

In <CA...@mail.gmail.com>
  "Re: Please Review: Application for a Media Type" on Fri, 22 Jan 2021 14:37:35 -0600,
  Wes McKinney <we...@gmail.com> wrote:

> Thank you for taking the lead on this. I gave a brief read through and
> I think it makes sense using Thrift or Protocol Buffers as a
> guideline. Would be good for some others to review who might be
> familiar with IANA media formats
> 
> On Wed, Jan 20, 2021 at 6:17 PM Weston Pace <we...@gmail.com> wrote:
>>
>> Per a previous discussion
>> (https://lists.apache.org/thread.html/b15726d0c0da2223ba1b45a226ef86263f688b20532a30535cd5e267%40%3Cdev.arrow.apache.org%3E)
>> and the resulting JIRA issue ARROW-7396
>> (https://issues.apache.org/jira/browse/ARROW-7396) there is a desire
>> to register the arrow format with the IANA as a formal media type
>> (actually two media types, one for the streaming format and one for
>> the file format).
>>
>> The form for applying is here: https://www.iana.org/form/media-types
>>
>> I have created a draft registration document (link below).
>>
>> The only fields with any real flexibility are "Security
>> Considerations", "Interoperability Considerations", and "Application
>> Usage".  I reviewed the applications for XML, JSON, and Thrift and
>> I've made a best attempt at these fields as well as posted examples
>> from the other languages.  Please review and feel free to suggest
>> changes.
>>
>> https://docs.google.com/document/d/1PmZFoSifV_TX4vXnv775WiOtqCgz5zLF5ryFRWio3HQ/edit?usp=sharing
>>
>> One we align on the content we should probably have a PMC member
>> actually make the submission and be listed as contact person.
>>
>> Thanks,
>>
>> Weston Pace
>> Ursa Computing

Re: Please Review: Application for a Media Type

Posted by Jorge Cardoso Leitão <jo...@gmail.com>.
Sorry about the delay,

I agree that .arrows is both more specific and informative.

Best,
Jorge


On Sat, May 1, 2021 at 1:11 AM Sutou Kouhei <ko...@clear-code.com> wrote:

> Hi,
>
> It seems that there are no more opinions.
>
> Weston, could you clean up the draft. Then we can start a vote.
>
>
> Thanks,
> --
> kou
>
> In <CA...@mail.gmail.com>
>   "Re: Please Review: Application for a Media Type" on Wed, 28 Apr 2021
> 14:30:47 -1000,
>   Weston Pace <we...@gmail.com> wrote:
>
> > +1 for .arrows from me.  I agree that .stream is too generic.
> >
> >
> > On Thu, Apr 22, 2021 at 7:42 PM Sutou Kouhei <ko...@clear-code.com> wrote:
> >>
> >> Hi,
> >>
> >> I feel that '.stream' is too generic. How about '.arrows'?
> >> JSON Lines uses 'l' suffix for extension: '.jsonl'
> >>
> >> https://jsonlines.org/#conventions
> >>
> >>
> >> Thanks,
> >> --
> >> kou
> >>
> >> In <CA...@mail.gmail.com>
> >>   "Re: Please Review: Application for a Media Type" on Thu, 22 Apr 2021
> 06:44:51 +0200,
> >>   Jorge Cardoso Leitão <jo...@gmail.com> wrote:
> >>
> >> > Thanks for driving this, exciting stuff!
> >> >
> >> > I went through it, left minor comments, it looks good to me.
> >> >
> >> > wrt to the extension: imo they should be different as the formats are
> not
> >> > interchangeable.
> >> >
> >> > AFAIK `.stream` is not taken: it was used by Adobe shockwave but it
> was
> >> > discontinued [1].
> >> > So, .arrow and .stream may be sufficient.
> >> >
> >> > [1] https://helpx.adobe.com/shockwave/shockwave-end-of-life-faq.html
> >> >
> >> > Best,
> >> > Jorge
> >> >
> >> >
> >> > On Thu, Apr 22, 2021 at 3:35 AM Sutou Kouhei <ko...@clear-code.com>
> wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> Thanks for updating the draft.
> >> >>
> >> >> I want to wait for at least a weak before we start a vote.
> >> >> Does anyone have an opinion about file extension of Apache
> >> >> Arrow format data? What do you think about ".arrow"?
> >> >>
> >> >>
> >> >> Thanks,
> >> >> --
> >> >> kou
> >> >>
> >> >> In <
> CAE4AYb1bO0FoZH4Oy5hKZHv5oQ6s-bQyWfDGD7CpjB1cZcTPHA@mail.gmail.com>
> >> >>   "Re: Please Review: Application for a Media Type" on Wed, 21 Apr
> 2021
> >> >> 08:17:40 -1000,
> >> >>   Weston Pace <we...@gmail.com> wrote:
> >> >>
> >> >> > Thank you for reviewing.  I have added your suggestions to the
> draft.
> >> >> > Are we ready for a vote?  If so I will clean up the comments and
> send
> >> >> > out a clean version of the draft.
> >> >> >
> >> >> > On Mon, Apr 19, 2021 at 3:10 PM Sutou Kouhei <ko...@clear-code.com>
> wrote:
> >> >> >>
> >> >> >> Hi,
> >> >> >>
> >> >> >> Sorry for not responding this...
> >> >> >>
> >> >> >> Weston, thanks for writing up the draft!
> >> >> >>
> >> >>
> https://docs.google.com/document/d/1PmZFoSifV_TX4vXnv775WiOtqCgz5zLF5ryFRWio3HQ/edit?usp=sharing
> >> >> >>
> >> >> >> Here are items we need to discuss before we apply a media
> >> >> >> type to IANA:
> >> >> >>
> >> >> >> 1. Interoperability Considerations
> >> >> >>
> >> >> >> Draft:
> >> >> >>
> >> >> >> > The Apache arrow format is intended to be a language
> >> >> >> > independent columnar memory format for flat and
> >> >> >> > hierarchical data.  It has been shown to work in a variety
> >> >> >> > of languages and applications.  Arrow files can be
> >> >> >> > provided in two different formats, a streaming format
> >> >> >> > (vnd.apache.arrow.stream) and a random access format
> >> >> >> > (vnd.apache.arrow.file).  Applications should be aware of
> >> >> >> > which format they are processing as the two are not
> >> >> >> > interchangeable.
> >> >> >>
> >> >> >> Note in draft:
> >> >> >>
> >> >> >> > Should we mention something like "applications should
> >> >> >> > make sure to check the 'version' field to ensure they
> >> >> >> > can process the file"?
> >> >> >>
> >> >> >> How about referring our format document for further
> >> >> >> information instead of mention the 'version' field?
> >> >> >> https://arrow.apache.org/docs/format/Columnar.html
> >> >> >>
> >> >> >> XML Media Types also refers the XML specification for
> >> >> >> further information:
> >> >> >>
> >> >> >> https://tools.ietf.org/html/rfc7303#section-9.1
> >> >> >>
> >> >> >> > For further information, see Section 2.9 "Standalone
> >> >> >> > Document Declaration" and Section 5 "Conformance" of [XML].
> >> >> >>
> >> >> >>
> >> >> >> 2. File extension(s)
> >> >> >>
> >> >> >> Draft:
> >> >> >>
> >> >> >> > N/A
> >> >> >>
> >> >> >> Note in draft:
> >> >> >>
> >> >> >> > Again, there are no formal extensions that have been
> >> >> >> > recommended before.  Do we want to introduce any?  I'm
> >> >> >> > pretty sure this is in no way binding (and it's unlikely
> >> >> >> > anyone will ever see it).
> >> >> >>
> >> >> >> I want recommended extensions to avoid spreading various
> >> >> >> extensions for Apache Arrow formats.
> >> >> >>
> >> >> >> How about the followings?
> >> >> >>
> >> >> >>   * vnd.apache.arrow.file: .arrow
> >> >> >>   * vnd.apache.arrow.stream: NA
> >> >> >>     (Generally, this format isn't saved as file. This format
> >> >> >>     is used for pipe, sending/receiving via socket and so on.)
> >> >> >>
> >> >> >> FYI: Here is a list that shows used extensions in our code
> >> >> >> base.
> >> >> >>
> >> >> >> Our integration test uses the following extensions:
> >> >> >>
> >> >> >>   * vnd.apache.arrow.file: .arrow_file
> >> >> >>   * vnd.apache.arrow.stream: .stream
> >> >> >>
> >> >> >>
> >> >>
> https://github.com/apache/arrow/blob/master/dev/archery/archery/integration/runner.py#L250-L257
> >> >> >>
> >> >> >>     log('-- Validating file')
> >> >> >>     producer_file_path = os.path.join(
> >> >> >>         gold_dir, "generated_" + test_case.name + ".arrow_file")
> >> >> >>     consumer.validate(json_path, producer_file_path)
> >> >> >>
> >> >> >>     log('-- Validating stream')
> >> >> >>     consumer_stream_path = os.path.join(
> >> >> >>         gold_dir, "generated_" + test_case.name + ".stream")
> >> >> >>
> >> >> >> Our C++ tests use the following extensions:
> >> >> >>
> >> >> >>   * vnd.apache.arrow.file: Not used (in-memory buffer is used)
> >> >> >>   * vnd.apache.arrow.stream: Not used (in-memory buffer is used)
> >> >> >>
> >> >> >> Our C++ examples use the following extensions:
> >> >> >>
> >> >> >>   * vnd.apache.arrow.file: .arrow
> >> >> >>   * vnd.apache.arrow.stream: NA
> >> >> >>
> >> >> >>
> >> >>
> https://github.com/apache/arrow/blob/master/cpp/examples/minimal_build/example.cc#L34
> >> >> >>
> >> >> >>     const char* arrow_filename = "test.arrow";
> >> >> >>
> >> >> >> Our Python documentation uses the following extensions:
> >> >> >>
> >> >> >>   * vnd.apache.arrow.file: .arrow
> >> >> >>   * vnd.apache.arrow.stream: Not used (in-memory buffer is used)
> >> >> >>
> >> >> >>
> >> >>
> https://github.com/apache/arrow/blob/master/docs/source/python/filesystems.rst
> >> >> >>
> >> >> >>    with local.open_output_stream("test.arrow") as file:
> >> >> >>
> >> >> >> Our Go tests use the following extensions:
> >> >> >>
> >> >> >>   * vnd.apache.arrow.file: Not used (no extension)
> >> >> >>   * vnd.apache.arrow.stream: Not used (no extension)
> >> >> >>
> >> >> >> Our Java tests use the following extensions:
> >> >> >>
> >> >> >>   * vnd.apache.arrow.file: .arrow
> >> >> >>   * vnd.apache.arrow.stream: .arrow but most of tests use
> in-memory
> >> >> buffer
> >> >> >>
> >> >> >>
> >> >>
> https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/ipc/TestArrowFile.java#L51
> >> >> >>
> >> >> >>     File file = new File("target/mytest_write.arrow");
> >> >> >>
> >> >> >>
> >> >>
> https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/ipc/TestRoundTrip.java#L176
> >> >> >>
> >> >> >>     final File temp = File.createTempFile("arrow-test-" + name +
> "-",
> >> >> ".arrow");
> >> >> >>
> >> >> >> Our JavaScript tests use the following extensions:
> >> >> >>
> >> >> >>   * vnd.apache.arrow.file: Not used (in-memory buffer is used)
> >> >> >>   * vnd.apache.arrow.stream: Not used (in-memory buffer is used)
> >> >> >>
> >> >> >> Our Julia tests use the following extensions:
> >> >> >>
> >> >> >>   * vnd.apache.arrow.file: Not used (in-memory buffer is used)
> >> >> >>   * vnd.apache.arrow.stream: Not used (in-memory buffer is used)
> >> >> >>
> >> >> >> Our Rust tests use the following extensions:
> >> >> >>
> >> >> >>   * vnd.apache.arrow.file: .arrow_file
> >> >> >>   * vnd.apache.arrow.stream: .stream
> >> >> >>
> >> >> >> Note that they use data in our integration test.
> >> >> >>
> >> >> >>
> >> >> >> Thanks,
> >> >> >> --
> >> >> >> kou
> >> >> >>
> >> >> >> In <
> CAJPUwMCkZUPPMOL-O0+d6fjWK-eAS2teyF_pw0QztHHVX-9WJg@mail.gmail.com>
> >> >> >>   "Re: Please Review: Application for a Media Type" on Fri, 22
> Jan 2021
> >> >> 14:37:35 -0600,
> >> >> >>   Wes McKinney <we...@gmail.com> wrote:
> >> >> >>
> >> >> >> > Thank you for taking the lead on this. I gave a brief read
> through and
> >> >> >> > I think it makes sense using Thrift or Protocol Buffers as a
> >> >> >> > guideline. Would be good for some others to review who might be
> >> >> >> > familiar with IANA media formats
> >> >> >> >
> >> >> >> > On Wed, Jan 20, 2021 at 6:17 PM Weston Pace <
> weston.pace@gmail.com>
> >> >> wrote:
> >> >> >> >>
> >> >> >> >> Per a previous discussion
> >> >> >> >> (
> >> >>
> https://lists.apache.org/thread.html/b15726d0c0da2223ba1b45a226ef86263f688b20532a30535cd5e267%40%3Cdev.arrow.apache.org%3E
> >> >> )
> >> >> >> >> and the resulting JIRA issue ARROW-7396
> >> >> >> >> (https://issues.apache.org/jira/browse/ARROW-7396) there is a
> desire
> >> >> >> >> to register the arrow format with the IANA as a formal media
> type
> >> >> >> >> (actually two media types, one for the streaming format and
> one for
> >> >> >> >> the file format).
> >> >> >> >>
> >> >> >> >> The form for applying is here:
> https://www.iana.org/form/media-types
> >> >> >> >>
> >> >> >> >> I have created a draft registration document (link below).
> >> >> >> >>
> >> >> >> >> The only fields with any real flexibility are "Security
> >> >> >> >> Considerations", "Interoperability Considerations", and
> "Application
> >> >> >> >> Usage".  I reviewed the applications for XML, JSON, and Thrift
> and
> >> >> >> >> I've made a best attempt at these fields as well as posted
> examples
> >> >> >> >> from the other languages.  Please review and feel free to
> suggest
> >> >> >> >> changes.
> >> >> >> >>
> >> >> >> >>
> >> >>
> https://docs.google.com/document/d/1PmZFoSifV_TX4vXnv775WiOtqCgz5zLF5ryFRWio3HQ/edit?usp=sharing
> >> >> >> >>
> >> >> >> >> One we align on the content we should probably have a PMC
> member
> >> >> >> >> actually make the submission and be listed as contact person.
> >> >> >> >>
> >> >> >> >> Thanks,
> >> >> >> >>
> >> >> >> >> Weston Pace
> >> >> >> >> Ursa Computing
> >> >>
>

Re: Please Review: Application for a Media Type

Posted by Sutou Kouhei <ko...@clear-code.com>.
Hi,

It seems that there are no more opinions.

Weston, could you clean up the draft. Then we can start a vote.


Thanks,
--
kou

In <CA...@mail.gmail.com>
  "Re: Please Review: Application for a Media Type" on Wed, 28 Apr 2021 14:30:47 -1000,
  Weston Pace <we...@gmail.com> wrote:

> +1 for .arrows from me.  I agree that .stream is too generic.
> 
> 
> On Thu, Apr 22, 2021 at 7:42 PM Sutou Kouhei <ko...@clear-code.com> wrote:
>>
>> Hi,
>>
>> I feel that '.stream' is too generic. How about '.arrows'?
>> JSON Lines uses 'l' suffix for extension: '.jsonl'
>>
>> https://jsonlines.org/#conventions
>>
>>
>> Thanks,
>> --
>> kou
>>
>> In <CA...@mail.gmail.com>
>>   "Re: Please Review: Application for a Media Type" on Thu, 22 Apr 2021 06:44:51 +0200,
>>   Jorge Cardoso Leitão <jo...@gmail.com> wrote:
>>
>> > Thanks for driving this, exciting stuff!
>> >
>> > I went through it, left minor comments, it looks good to me.
>> >
>> > wrt to the extension: imo they should be different as the formats are not
>> > interchangeable.
>> >
>> > AFAIK `.stream` is not taken: it was used by Adobe shockwave but it was
>> > discontinued [1].
>> > So, .arrow and .stream may be sufficient.
>> >
>> > [1] https://helpx.adobe.com/shockwave/shockwave-end-of-life-faq.html
>> >
>> > Best,
>> > Jorge
>> >
>> >
>> > On Thu, Apr 22, 2021 at 3:35 AM Sutou Kouhei <ko...@clear-code.com> wrote:
>> >
>> >> Hi,
>> >>
>> >> Thanks for updating the draft.
>> >>
>> >> I want to wait for at least a weak before we start a vote.
>> >> Does anyone have an opinion about file extension of Apache
>> >> Arrow format data? What do you think about ".arrow"?
>> >>
>> >>
>> >> Thanks,
>> >> --
>> >> kou
>> >>
>> >> In <CA...@mail.gmail.com>
>> >>   "Re: Please Review: Application for a Media Type" on Wed, 21 Apr 2021
>> >> 08:17:40 -1000,
>> >>   Weston Pace <we...@gmail.com> wrote:
>> >>
>> >> > Thank you for reviewing.  I have added your suggestions to the draft.
>> >> > Are we ready for a vote?  If so I will clean up the comments and send
>> >> > out a clean version of the draft.
>> >> >
>> >> > On Mon, Apr 19, 2021 at 3:10 PM Sutou Kouhei <ko...@clear-code.com> wrote:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> Sorry for not responding this...
>> >> >>
>> >> >> Weston, thanks for writing up the draft!
>> >> >>
>> >> https://docs.google.com/document/d/1PmZFoSifV_TX4vXnv775WiOtqCgz5zLF5ryFRWio3HQ/edit?usp=sharing
>> >> >>
>> >> >> Here are items we need to discuss before we apply a media
>> >> >> type to IANA:
>> >> >>
>> >> >> 1. Interoperability Considerations
>> >> >>
>> >> >> Draft:
>> >> >>
>> >> >> > The Apache arrow format is intended to be a language
>> >> >> > independent columnar memory format for flat and
>> >> >> > hierarchical data.  It has been shown to work in a variety
>> >> >> > of languages and applications.  Arrow files can be
>> >> >> > provided in two different formats, a streaming format
>> >> >> > (vnd.apache.arrow.stream) and a random access format
>> >> >> > (vnd.apache.arrow.file).  Applications should be aware of
>> >> >> > which format they are processing as the two are not
>> >> >> > interchangeable.
>> >> >>
>> >> >> Note in draft:
>> >> >>
>> >> >> > Should we mention something like "applications should
>> >> >> > make sure to check the 'version' field to ensure they
>> >> >> > can process the file"?
>> >> >>
>> >> >> How about referring our format document for further
>> >> >> information instead of mention the 'version' field?
>> >> >> https://arrow.apache.org/docs/format/Columnar.html
>> >> >>
>> >> >> XML Media Types also refers the XML specification for
>> >> >> further information:
>> >> >>
>> >> >> https://tools.ietf.org/html/rfc7303#section-9.1
>> >> >>
>> >> >> > For further information, see Section 2.9 "Standalone
>> >> >> > Document Declaration" and Section 5 "Conformance" of [XML].
>> >> >>
>> >> >>
>> >> >> 2. File extension(s)
>> >> >>
>> >> >> Draft:
>> >> >>
>> >> >> > N/A
>> >> >>
>> >> >> Note in draft:
>> >> >>
>> >> >> > Again, there are no formal extensions that have been
>> >> >> > recommended before.  Do we want to introduce any?  I'm
>> >> >> > pretty sure this is in no way binding (and it's unlikely
>> >> >> > anyone will ever see it).
>> >> >>
>> >> >> I want recommended extensions to avoid spreading various
>> >> >> extensions for Apache Arrow formats.
>> >> >>
>> >> >> How about the followings?
>> >> >>
>> >> >>   * vnd.apache.arrow.file: .arrow
>> >> >>   * vnd.apache.arrow.stream: NA
>> >> >>     (Generally, this format isn't saved as file. This format
>> >> >>     is used for pipe, sending/receiving via socket and so on.)
>> >> >>
>> >> >> FYI: Here is a list that shows used extensions in our code
>> >> >> base.
>> >> >>
>> >> >> Our integration test uses the following extensions:
>> >> >>
>> >> >>   * vnd.apache.arrow.file: .arrow_file
>> >> >>   * vnd.apache.arrow.stream: .stream
>> >> >>
>> >> >>
>> >> https://github.com/apache/arrow/blob/master/dev/archery/archery/integration/runner.py#L250-L257
>> >> >>
>> >> >>     log('-- Validating file')
>> >> >>     producer_file_path = os.path.join(
>> >> >>         gold_dir, "generated_" + test_case.name + ".arrow_file")
>> >> >>     consumer.validate(json_path, producer_file_path)
>> >> >>
>> >> >>     log('-- Validating stream')
>> >> >>     consumer_stream_path = os.path.join(
>> >> >>         gold_dir, "generated_" + test_case.name + ".stream")
>> >> >>
>> >> >> Our C++ tests use the following extensions:
>> >> >>
>> >> >>   * vnd.apache.arrow.file: Not used (in-memory buffer is used)
>> >> >>   * vnd.apache.arrow.stream: Not used (in-memory buffer is used)
>> >> >>
>> >> >> Our C++ examples use the following extensions:
>> >> >>
>> >> >>   * vnd.apache.arrow.file: .arrow
>> >> >>   * vnd.apache.arrow.stream: NA
>> >> >>
>> >> >>
>> >> https://github.com/apache/arrow/blob/master/cpp/examples/minimal_build/example.cc#L34
>> >> >>
>> >> >>     const char* arrow_filename = "test.arrow";
>> >> >>
>> >> >> Our Python documentation uses the following extensions:
>> >> >>
>> >> >>   * vnd.apache.arrow.file: .arrow
>> >> >>   * vnd.apache.arrow.stream: Not used (in-memory buffer is used)
>> >> >>
>> >> >>
>> >> https://github.com/apache/arrow/blob/master/docs/source/python/filesystems.rst
>> >> >>
>> >> >>    with local.open_output_stream("test.arrow") as file:
>> >> >>
>> >> >> Our Go tests use the following extensions:
>> >> >>
>> >> >>   * vnd.apache.arrow.file: Not used (no extension)
>> >> >>   * vnd.apache.arrow.stream: Not used (no extension)
>> >> >>
>> >> >> Our Java tests use the following extensions:
>> >> >>
>> >> >>   * vnd.apache.arrow.file: .arrow
>> >> >>   * vnd.apache.arrow.stream: .arrow but most of tests use in-memory
>> >> buffer
>> >> >>
>> >> >>
>> >> https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/ipc/TestArrowFile.java#L51
>> >> >>
>> >> >>     File file = new File("target/mytest_write.arrow");
>> >> >>
>> >> >>
>> >> https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/ipc/TestRoundTrip.java#L176
>> >> >>
>> >> >>     final File temp = File.createTempFile("arrow-test-" + name + "-",
>> >> ".arrow");
>> >> >>
>> >> >> Our JavaScript tests use the following extensions:
>> >> >>
>> >> >>   * vnd.apache.arrow.file: Not used (in-memory buffer is used)
>> >> >>   * vnd.apache.arrow.stream: Not used (in-memory buffer is used)
>> >> >>
>> >> >> Our Julia tests use the following extensions:
>> >> >>
>> >> >>   * vnd.apache.arrow.file: Not used (in-memory buffer is used)
>> >> >>   * vnd.apache.arrow.stream: Not used (in-memory buffer is used)
>> >> >>
>> >> >> Our Rust tests use the following extensions:
>> >> >>
>> >> >>   * vnd.apache.arrow.file: .arrow_file
>> >> >>   * vnd.apache.arrow.stream: .stream
>> >> >>
>> >> >> Note that they use data in our integration test.
>> >> >>
>> >> >>
>> >> >> Thanks,
>> >> >> --
>> >> >> kou
>> >> >>
>> >> >> In <CA...@mail.gmail.com>
>> >> >>   "Re: Please Review: Application for a Media Type" on Fri, 22 Jan 2021
>> >> 14:37:35 -0600,
>> >> >>   Wes McKinney <we...@gmail.com> wrote:
>> >> >>
>> >> >> > Thank you for taking the lead on this. I gave a brief read through and
>> >> >> > I think it makes sense using Thrift or Protocol Buffers as a
>> >> >> > guideline. Would be good for some others to review who might be
>> >> >> > familiar with IANA media formats
>> >> >> >
>> >> >> > On Wed, Jan 20, 2021 at 6:17 PM Weston Pace <we...@gmail.com>
>> >> wrote:
>> >> >> >>
>> >> >> >> Per a previous discussion
>> >> >> >> (
>> >> https://lists.apache.org/thread.html/b15726d0c0da2223ba1b45a226ef86263f688b20532a30535cd5e267%40%3Cdev.arrow.apache.org%3E
>> >> )
>> >> >> >> and the resulting JIRA issue ARROW-7396
>> >> >> >> (https://issues.apache.org/jira/browse/ARROW-7396) there is a desire
>> >> >> >> to register the arrow format with the IANA as a formal media type
>> >> >> >> (actually two media types, one for the streaming format and one for
>> >> >> >> the file format).
>> >> >> >>
>> >> >> >> The form for applying is here: https://www.iana.org/form/media-types
>> >> >> >>
>> >> >> >> I have created a draft registration document (link below).
>> >> >> >>
>> >> >> >> The only fields with any real flexibility are "Security
>> >> >> >> Considerations", "Interoperability Considerations", and "Application
>> >> >> >> Usage".  I reviewed the applications for XML, JSON, and Thrift and
>> >> >> >> I've made a best attempt at these fields as well as posted examples
>> >> >> >> from the other languages.  Please review and feel free to suggest
>> >> >> >> changes.
>> >> >> >>
>> >> >> >>
>> >> https://docs.google.com/document/d/1PmZFoSifV_TX4vXnv775WiOtqCgz5zLF5ryFRWio3HQ/edit?usp=sharing
>> >> >> >>
>> >> >> >> One we align on the content we should probably have a PMC member
>> >> >> >> actually make the submission and be listed as contact person.
>> >> >> >>
>> >> >> >> Thanks,
>> >> >> >>
>> >> >> >> Weston Pace
>> >> >> >> Ursa Computing
>> >>

Re: Please Review: Application for a Media Type

Posted by Weston Pace <we...@gmail.com>.
+1 for .arrows from me.  I agree that .stream is too generic.


On Thu, Apr 22, 2021 at 7:42 PM Sutou Kouhei <ko...@clear-code.com> wrote:
>
> Hi,
>
> I feel that '.stream' is too generic. How about '.arrows'?
> JSON Lines uses 'l' suffix for extension: '.jsonl'
>
> https://jsonlines.org/#conventions
>
>
> Thanks,
> --
> kou
>
> In <CA...@mail.gmail.com>
>   "Re: Please Review: Application for a Media Type" on Thu, 22 Apr 2021 06:44:51 +0200,
>   Jorge Cardoso Leitão <jo...@gmail.com> wrote:
>
> > Thanks for driving this, exciting stuff!
> >
> > I went through it, left minor comments, it looks good to me.
> >
> > wrt to the extension: imo they should be different as the formats are not
> > interchangeable.
> >
> > AFAIK `.stream` is not taken: it was used by Adobe shockwave but it was
> > discontinued [1].
> > So, .arrow and .stream may be sufficient.
> >
> > [1] https://helpx.adobe.com/shockwave/shockwave-end-of-life-faq.html
> >
> > Best,
> > Jorge
> >
> >
> > On Thu, Apr 22, 2021 at 3:35 AM Sutou Kouhei <ko...@clear-code.com> wrote:
> >
> >> Hi,
> >>
> >> Thanks for updating the draft.
> >>
> >> I want to wait for at least a weak before we start a vote.
> >> Does anyone have an opinion about file extension of Apache
> >> Arrow format data? What do you think about ".arrow"?
> >>
> >>
> >> Thanks,
> >> --
> >> kou
> >>
> >> In <CA...@mail.gmail.com>
> >>   "Re: Please Review: Application for a Media Type" on Wed, 21 Apr 2021
> >> 08:17:40 -1000,
> >>   Weston Pace <we...@gmail.com> wrote:
> >>
> >> > Thank you for reviewing.  I have added your suggestions to the draft.
> >> > Are we ready for a vote?  If so I will clean up the comments and send
> >> > out a clean version of the draft.
> >> >
> >> > On Mon, Apr 19, 2021 at 3:10 PM Sutou Kouhei <ko...@clear-code.com> wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> Sorry for not responding this...
> >> >>
> >> >> Weston, thanks for writing up the draft!
> >> >>
> >> https://docs.google.com/document/d/1PmZFoSifV_TX4vXnv775WiOtqCgz5zLF5ryFRWio3HQ/edit?usp=sharing
> >> >>
> >> >> Here are items we need to discuss before we apply a media
> >> >> type to IANA:
> >> >>
> >> >> 1. Interoperability Considerations
> >> >>
> >> >> Draft:
> >> >>
> >> >> > The Apache arrow format is intended to be a language
> >> >> > independent columnar memory format for flat and
> >> >> > hierarchical data.  It has been shown to work in a variety
> >> >> > of languages and applications.  Arrow files can be
> >> >> > provided in two different formats, a streaming format
> >> >> > (vnd.apache.arrow.stream) and a random access format
> >> >> > (vnd.apache.arrow.file).  Applications should be aware of
> >> >> > which format they are processing as the two are not
> >> >> > interchangeable.
> >> >>
> >> >> Note in draft:
> >> >>
> >> >> > Should we mention something like "applications should
> >> >> > make sure to check the 'version' field to ensure they
> >> >> > can process the file"?
> >> >>
> >> >> How about referring our format document for further
> >> >> information instead of mention the 'version' field?
> >> >> https://arrow.apache.org/docs/format/Columnar.html
> >> >>
> >> >> XML Media Types also refers the XML specification for
> >> >> further information:
> >> >>
> >> >> https://tools.ietf.org/html/rfc7303#section-9.1
> >> >>
> >> >> > For further information, see Section 2.9 "Standalone
> >> >> > Document Declaration" and Section 5 "Conformance" of [XML].
> >> >>
> >> >>
> >> >> 2. File extension(s)
> >> >>
> >> >> Draft:
> >> >>
> >> >> > N/A
> >> >>
> >> >> Note in draft:
> >> >>
> >> >> > Again, there are no formal extensions that have been
> >> >> > recommended before.  Do we want to introduce any?  I'm
> >> >> > pretty sure this is in no way binding (and it's unlikely
> >> >> > anyone will ever see it).
> >> >>
> >> >> I want recommended extensions to avoid spreading various
> >> >> extensions for Apache Arrow formats.
> >> >>
> >> >> How about the followings?
> >> >>
> >> >>   * vnd.apache.arrow.file: .arrow
> >> >>   * vnd.apache.arrow.stream: NA
> >> >>     (Generally, this format isn't saved as file. This format
> >> >>     is used for pipe, sending/receiving via socket and so on.)
> >> >>
> >> >> FYI: Here is a list that shows used extensions in our code
> >> >> base.
> >> >>
> >> >> Our integration test uses the following extensions:
> >> >>
> >> >>   * vnd.apache.arrow.file: .arrow_file
> >> >>   * vnd.apache.arrow.stream: .stream
> >> >>
> >> >>
> >> https://github.com/apache/arrow/blob/master/dev/archery/archery/integration/runner.py#L250-L257
> >> >>
> >> >>     log('-- Validating file')
> >> >>     producer_file_path = os.path.join(
> >> >>         gold_dir, "generated_" + test_case.name + ".arrow_file")
> >> >>     consumer.validate(json_path, producer_file_path)
> >> >>
> >> >>     log('-- Validating stream')
> >> >>     consumer_stream_path = os.path.join(
> >> >>         gold_dir, "generated_" + test_case.name + ".stream")
> >> >>
> >> >> Our C++ tests use the following extensions:
> >> >>
> >> >>   * vnd.apache.arrow.file: Not used (in-memory buffer is used)
> >> >>   * vnd.apache.arrow.stream: Not used (in-memory buffer is used)
> >> >>
> >> >> Our C++ examples use the following extensions:
> >> >>
> >> >>   * vnd.apache.arrow.file: .arrow
> >> >>   * vnd.apache.arrow.stream: NA
> >> >>
> >> >>
> >> https://github.com/apache/arrow/blob/master/cpp/examples/minimal_build/example.cc#L34
> >> >>
> >> >>     const char* arrow_filename = "test.arrow";
> >> >>
> >> >> Our Python documentation uses the following extensions:
> >> >>
> >> >>   * vnd.apache.arrow.file: .arrow
> >> >>   * vnd.apache.arrow.stream: Not used (in-memory buffer is used)
> >> >>
> >> >>
> >> https://github.com/apache/arrow/blob/master/docs/source/python/filesystems.rst
> >> >>
> >> >>    with local.open_output_stream("test.arrow") as file:
> >> >>
> >> >> Our Go tests use the following extensions:
> >> >>
> >> >>   * vnd.apache.arrow.file: Not used (no extension)
> >> >>   * vnd.apache.arrow.stream: Not used (no extension)
> >> >>
> >> >> Our Java tests use the following extensions:
> >> >>
> >> >>   * vnd.apache.arrow.file: .arrow
> >> >>   * vnd.apache.arrow.stream: .arrow but most of tests use in-memory
> >> buffer
> >> >>
> >> >>
> >> https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/ipc/TestArrowFile.java#L51
> >> >>
> >> >>     File file = new File("target/mytest_write.arrow");
> >> >>
> >> >>
> >> https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/ipc/TestRoundTrip.java#L176
> >> >>
> >> >>     final File temp = File.createTempFile("arrow-test-" + name + "-",
> >> ".arrow");
> >> >>
> >> >> Our JavaScript tests use the following extensions:
> >> >>
> >> >>   * vnd.apache.arrow.file: Not used (in-memory buffer is used)
> >> >>   * vnd.apache.arrow.stream: Not used (in-memory buffer is used)
> >> >>
> >> >> Our Julia tests use the following extensions:
> >> >>
> >> >>   * vnd.apache.arrow.file: Not used (in-memory buffer is used)
> >> >>   * vnd.apache.arrow.stream: Not used (in-memory buffer is used)
> >> >>
> >> >> Our Rust tests use the following extensions:
> >> >>
> >> >>   * vnd.apache.arrow.file: .arrow_file
> >> >>   * vnd.apache.arrow.stream: .stream
> >> >>
> >> >> Note that they use data in our integration test.
> >> >>
> >> >>
> >> >> Thanks,
> >> >> --
> >> >> kou
> >> >>
> >> >> In <CA...@mail.gmail.com>
> >> >>   "Re: Please Review: Application for a Media Type" on Fri, 22 Jan 2021
> >> 14:37:35 -0600,
> >> >>   Wes McKinney <we...@gmail.com> wrote:
> >> >>
> >> >> > Thank you for taking the lead on this. I gave a brief read through and
> >> >> > I think it makes sense using Thrift or Protocol Buffers as a
> >> >> > guideline. Would be good for some others to review who might be
> >> >> > familiar with IANA media formats
> >> >> >
> >> >> > On Wed, Jan 20, 2021 at 6:17 PM Weston Pace <we...@gmail.com>
> >> wrote:
> >> >> >>
> >> >> >> Per a previous discussion
> >> >> >> (
> >> https://lists.apache.org/thread.html/b15726d0c0da2223ba1b45a226ef86263f688b20532a30535cd5e267%40%3Cdev.arrow.apache.org%3E
> >> )
> >> >> >> and the resulting JIRA issue ARROW-7396
> >> >> >> (https://issues.apache.org/jira/browse/ARROW-7396) there is a desire
> >> >> >> to register the arrow format with the IANA as a formal media type
> >> >> >> (actually two media types, one for the streaming format and one for
> >> >> >> the file format).
> >> >> >>
> >> >> >> The form for applying is here: https://www.iana.org/form/media-types
> >> >> >>
> >> >> >> I have created a draft registration document (link below).
> >> >> >>
> >> >> >> The only fields with any real flexibility are "Security
> >> >> >> Considerations", "Interoperability Considerations", and "Application
> >> >> >> Usage".  I reviewed the applications for XML, JSON, and Thrift and
> >> >> >> I've made a best attempt at these fields as well as posted examples
> >> >> >> from the other languages.  Please review and feel free to suggest
> >> >> >> changes.
> >> >> >>
> >> >> >>
> >> https://docs.google.com/document/d/1PmZFoSifV_TX4vXnv775WiOtqCgz5zLF5ryFRWio3HQ/edit?usp=sharing
> >> >> >>
> >> >> >> One we align on the content we should probably have a PMC member
> >> >> >> actually make the submission and be listed as contact person.
> >> >> >>
> >> >> >> Thanks,
> >> >> >>
> >> >> >> Weston Pace
> >> >> >> Ursa Computing
> >>

Re: Please Review: Application for a Media Type

Posted by Sutou Kouhei <ko...@clear-code.com>.
Hi,

I feel that '.stream' is too generic. How about '.arrows'?
JSON Lines uses 'l' suffix for extension: '.jsonl'

https://jsonlines.org/#conventions


Thanks,
--
kou

In <CA...@mail.gmail.com>
  "Re: Please Review: Application for a Media Type" on Thu, 22 Apr 2021 06:44:51 +0200,
  Jorge Cardoso Leitão <jo...@gmail.com> wrote:

> Thanks for driving this, exciting stuff!
> 
> I went through it, left minor comments, it looks good to me.
> 
> wrt to the extension: imo they should be different as the formats are not
> interchangeable.
> 
> AFAIK `.stream` is not taken: it was used by Adobe shockwave but it was
> discontinued [1].
> So, .arrow and .stream may be sufficient.
> 
> [1] https://helpx.adobe.com/shockwave/shockwave-end-of-life-faq.html
> 
> Best,
> Jorge
> 
> 
> On Thu, Apr 22, 2021 at 3:35 AM Sutou Kouhei <ko...@clear-code.com> wrote:
> 
>> Hi,
>>
>> Thanks for updating the draft.
>>
>> I want to wait for at least a weak before we start a vote.
>> Does anyone have an opinion about file extension of Apache
>> Arrow format data? What do you think about ".arrow"?
>>
>>
>> Thanks,
>> --
>> kou
>>
>> In <CA...@mail.gmail.com>
>>   "Re: Please Review: Application for a Media Type" on Wed, 21 Apr 2021
>> 08:17:40 -1000,
>>   Weston Pace <we...@gmail.com> wrote:
>>
>> > Thank you for reviewing.  I have added your suggestions to the draft.
>> > Are we ready for a vote?  If so I will clean up the comments and send
>> > out a clean version of the draft.
>> >
>> > On Mon, Apr 19, 2021 at 3:10 PM Sutou Kouhei <ko...@clear-code.com> wrote:
>> >>
>> >> Hi,
>> >>
>> >> Sorry for not responding this...
>> >>
>> >> Weston, thanks for writing up the draft!
>> >>
>> https://docs.google.com/document/d/1PmZFoSifV_TX4vXnv775WiOtqCgz5zLF5ryFRWio3HQ/edit?usp=sharing
>> >>
>> >> Here are items we need to discuss before we apply a media
>> >> type to IANA:
>> >>
>> >> 1. Interoperability Considerations
>> >>
>> >> Draft:
>> >>
>> >> > The Apache arrow format is intended to be a language
>> >> > independent columnar memory format for flat and
>> >> > hierarchical data.  It has been shown to work in a variety
>> >> > of languages and applications.  Arrow files can be
>> >> > provided in two different formats, a streaming format
>> >> > (vnd.apache.arrow.stream) and a random access format
>> >> > (vnd.apache.arrow.file).  Applications should be aware of
>> >> > which format they are processing as the two are not
>> >> > interchangeable.
>> >>
>> >> Note in draft:
>> >>
>> >> > Should we mention something like "applications should
>> >> > make sure to check the 'version' field to ensure they
>> >> > can process the file"?
>> >>
>> >> How about referring our format document for further
>> >> information instead of mention the 'version' field?
>> >> https://arrow.apache.org/docs/format/Columnar.html
>> >>
>> >> XML Media Types also refers the XML specification for
>> >> further information:
>> >>
>> >> https://tools.ietf.org/html/rfc7303#section-9.1
>> >>
>> >> > For further information, see Section 2.9 "Standalone
>> >> > Document Declaration" and Section 5 "Conformance" of [XML].
>> >>
>> >>
>> >> 2. File extension(s)
>> >>
>> >> Draft:
>> >>
>> >> > N/A
>> >>
>> >> Note in draft:
>> >>
>> >> > Again, there are no formal extensions that have been
>> >> > recommended before.  Do we want to introduce any?  I'm
>> >> > pretty sure this is in no way binding (and it's unlikely
>> >> > anyone will ever see it).
>> >>
>> >> I want recommended extensions to avoid spreading various
>> >> extensions for Apache Arrow formats.
>> >>
>> >> How about the followings?
>> >>
>> >>   * vnd.apache.arrow.file: .arrow
>> >>   * vnd.apache.arrow.stream: NA
>> >>     (Generally, this format isn't saved as file. This format
>> >>     is used for pipe, sending/receiving via socket and so on.)
>> >>
>> >> FYI: Here is a list that shows used extensions in our code
>> >> base.
>> >>
>> >> Our integration test uses the following extensions:
>> >>
>> >>   * vnd.apache.arrow.file: .arrow_file
>> >>   * vnd.apache.arrow.stream: .stream
>> >>
>> >>
>> https://github.com/apache/arrow/blob/master/dev/archery/archery/integration/runner.py#L250-L257
>> >>
>> >>     log('-- Validating file')
>> >>     producer_file_path = os.path.join(
>> >>         gold_dir, "generated_" + test_case.name + ".arrow_file")
>> >>     consumer.validate(json_path, producer_file_path)
>> >>
>> >>     log('-- Validating stream')
>> >>     consumer_stream_path = os.path.join(
>> >>         gold_dir, "generated_" + test_case.name + ".stream")
>> >>
>> >> Our C++ tests use the following extensions:
>> >>
>> >>   * vnd.apache.arrow.file: Not used (in-memory buffer is used)
>> >>   * vnd.apache.arrow.stream: Not used (in-memory buffer is used)
>> >>
>> >> Our C++ examples use the following extensions:
>> >>
>> >>   * vnd.apache.arrow.file: .arrow
>> >>   * vnd.apache.arrow.stream: NA
>> >>
>> >>
>> https://github.com/apache/arrow/blob/master/cpp/examples/minimal_build/example.cc#L34
>> >>
>> >>     const char* arrow_filename = "test.arrow";
>> >>
>> >> Our Python documentation uses the following extensions:
>> >>
>> >>   * vnd.apache.arrow.file: .arrow
>> >>   * vnd.apache.arrow.stream: Not used (in-memory buffer is used)
>> >>
>> >>
>> https://github.com/apache/arrow/blob/master/docs/source/python/filesystems.rst
>> >>
>> >>    with local.open_output_stream("test.arrow") as file:
>> >>
>> >> Our Go tests use the following extensions:
>> >>
>> >>   * vnd.apache.arrow.file: Not used (no extension)
>> >>   * vnd.apache.arrow.stream: Not used (no extension)
>> >>
>> >> Our Java tests use the following extensions:
>> >>
>> >>   * vnd.apache.arrow.file: .arrow
>> >>   * vnd.apache.arrow.stream: .arrow but most of tests use in-memory
>> buffer
>> >>
>> >>
>> https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/ipc/TestArrowFile.java#L51
>> >>
>> >>     File file = new File("target/mytest_write.arrow");
>> >>
>> >>
>> https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/ipc/TestRoundTrip.java#L176
>> >>
>> >>     final File temp = File.createTempFile("arrow-test-" + name + "-",
>> ".arrow");
>> >>
>> >> Our JavaScript tests use the following extensions:
>> >>
>> >>   * vnd.apache.arrow.file: Not used (in-memory buffer is used)
>> >>   * vnd.apache.arrow.stream: Not used (in-memory buffer is used)
>> >>
>> >> Our Julia tests use the following extensions:
>> >>
>> >>   * vnd.apache.arrow.file: Not used (in-memory buffer is used)
>> >>   * vnd.apache.arrow.stream: Not used (in-memory buffer is used)
>> >>
>> >> Our Rust tests use the following extensions:
>> >>
>> >>   * vnd.apache.arrow.file: .arrow_file
>> >>   * vnd.apache.arrow.stream: .stream
>> >>
>> >> Note that they use data in our integration test.
>> >>
>> >>
>> >> Thanks,
>> >> --
>> >> kou
>> >>
>> >> In <CA...@mail.gmail.com>
>> >>   "Re: Please Review: Application for a Media Type" on Fri, 22 Jan 2021
>> 14:37:35 -0600,
>> >>   Wes McKinney <we...@gmail.com> wrote:
>> >>
>> >> > Thank you for taking the lead on this. I gave a brief read through and
>> >> > I think it makes sense using Thrift or Protocol Buffers as a
>> >> > guideline. Would be good for some others to review who might be
>> >> > familiar with IANA media formats
>> >> >
>> >> > On Wed, Jan 20, 2021 at 6:17 PM Weston Pace <we...@gmail.com>
>> wrote:
>> >> >>
>> >> >> Per a previous discussion
>> >> >> (
>> https://lists.apache.org/thread.html/b15726d0c0da2223ba1b45a226ef86263f688b20532a30535cd5e267%40%3Cdev.arrow.apache.org%3E
>> )
>> >> >> and the resulting JIRA issue ARROW-7396
>> >> >> (https://issues.apache.org/jira/browse/ARROW-7396) there is a desire
>> >> >> to register the arrow format with the IANA as a formal media type
>> >> >> (actually two media types, one for the streaming format and one for
>> >> >> the file format).
>> >> >>
>> >> >> The form for applying is here: https://www.iana.org/form/media-types
>> >> >>
>> >> >> I have created a draft registration document (link below).
>> >> >>
>> >> >> The only fields with any real flexibility are "Security
>> >> >> Considerations", "Interoperability Considerations", and "Application
>> >> >> Usage".  I reviewed the applications for XML, JSON, and Thrift and
>> >> >> I've made a best attempt at these fields as well as posted examples
>> >> >> from the other languages.  Please review and feel free to suggest
>> >> >> changes.
>> >> >>
>> >> >>
>> https://docs.google.com/document/d/1PmZFoSifV_TX4vXnv775WiOtqCgz5zLF5ryFRWio3HQ/edit?usp=sharing
>> >> >>
>> >> >> One we align on the content we should probably have a PMC member
>> >> >> actually make the submission and be listed as contact person.
>> >> >>
>> >> >> Thanks,
>> >> >>
>> >> >> Weston Pace
>> >> >> Ursa Computing
>>

Re: Please Review: Application for a Media Type

Posted by Jorge Cardoso Leitão <jo...@gmail.com>.
Thanks for driving this, exciting stuff!

I went through it, left minor comments, it looks good to me.

wrt to the extension: imo they should be different as the formats are not
interchangeable.

AFAIK `.stream` is not taken: it was used by Adobe shockwave but it was
discontinued [1].
So, .arrow and .stream may be sufficient.

[1] https://helpx.adobe.com/shockwave/shockwave-end-of-life-faq.html

Best,
Jorge


On Thu, Apr 22, 2021 at 3:35 AM Sutou Kouhei <ko...@clear-code.com> wrote:

> Hi,
>
> Thanks for updating the draft.
>
> I want to wait for at least a weak before we start a vote.
> Does anyone have an opinion about file extension of Apache
> Arrow format data? What do you think about ".arrow"?
>
>
> Thanks,
> --
> kou
>
> In <CA...@mail.gmail.com>
>   "Re: Please Review: Application for a Media Type" on Wed, 21 Apr 2021
> 08:17:40 -1000,
>   Weston Pace <we...@gmail.com> wrote:
>
> > Thank you for reviewing.  I have added your suggestions to the draft.
> > Are we ready for a vote?  If so I will clean up the comments and send
> > out a clean version of the draft.
> >
> > On Mon, Apr 19, 2021 at 3:10 PM Sutou Kouhei <ko...@clear-code.com> wrote:
> >>
> >> Hi,
> >>
> >> Sorry for not responding this...
> >>
> >> Weston, thanks for writing up the draft!
> >>
> https://docs.google.com/document/d/1PmZFoSifV_TX4vXnv775WiOtqCgz5zLF5ryFRWio3HQ/edit?usp=sharing
> >>
> >> Here are items we need to discuss before we apply a media
> >> type to IANA:
> >>
> >> 1. Interoperability Considerations
> >>
> >> Draft:
> >>
> >> > The Apache arrow format is intended to be a language
> >> > independent columnar memory format for flat and
> >> > hierarchical data.  It has been shown to work in a variety
> >> > of languages and applications.  Arrow files can be
> >> > provided in two different formats, a streaming format
> >> > (vnd.apache.arrow.stream) and a random access format
> >> > (vnd.apache.arrow.file).  Applications should be aware of
> >> > which format they are processing as the two are not
> >> > interchangeable.
> >>
> >> Note in draft:
> >>
> >> > Should we mention something like "applications should
> >> > make sure to check the 'version' field to ensure they
> >> > can process the file"?
> >>
> >> How about referring our format document for further
> >> information instead of mention the 'version' field?
> >> https://arrow.apache.org/docs/format/Columnar.html
> >>
> >> XML Media Types also refers the XML specification for
> >> further information:
> >>
> >> https://tools.ietf.org/html/rfc7303#section-9.1
> >>
> >> > For further information, see Section 2.9 "Standalone
> >> > Document Declaration" and Section 5 "Conformance" of [XML].
> >>
> >>
> >> 2. File extension(s)
> >>
> >> Draft:
> >>
> >> > N/A
> >>
> >> Note in draft:
> >>
> >> > Again, there are no formal extensions that have been
> >> > recommended before.  Do we want to introduce any?  I'm
> >> > pretty sure this is in no way binding (and it's unlikely
> >> > anyone will ever see it).
> >>
> >> I want recommended extensions to avoid spreading various
> >> extensions for Apache Arrow formats.
> >>
> >> How about the followings?
> >>
> >>   * vnd.apache.arrow.file: .arrow
> >>   * vnd.apache.arrow.stream: NA
> >>     (Generally, this format isn't saved as file. This format
> >>     is used for pipe, sending/receiving via socket and so on.)
> >>
> >> FYI: Here is a list that shows used extensions in our code
> >> base.
> >>
> >> Our integration test uses the following extensions:
> >>
> >>   * vnd.apache.arrow.file: .arrow_file
> >>   * vnd.apache.arrow.stream: .stream
> >>
> >>
> https://github.com/apache/arrow/blob/master/dev/archery/archery/integration/runner.py#L250-L257
> >>
> >>     log('-- Validating file')
> >>     producer_file_path = os.path.join(
> >>         gold_dir, "generated_" + test_case.name + ".arrow_file")
> >>     consumer.validate(json_path, producer_file_path)
> >>
> >>     log('-- Validating stream')
> >>     consumer_stream_path = os.path.join(
> >>         gold_dir, "generated_" + test_case.name + ".stream")
> >>
> >> Our C++ tests use the following extensions:
> >>
> >>   * vnd.apache.arrow.file: Not used (in-memory buffer is used)
> >>   * vnd.apache.arrow.stream: Not used (in-memory buffer is used)
> >>
> >> Our C++ examples use the following extensions:
> >>
> >>   * vnd.apache.arrow.file: .arrow
> >>   * vnd.apache.arrow.stream: NA
> >>
> >>
> https://github.com/apache/arrow/blob/master/cpp/examples/minimal_build/example.cc#L34
> >>
> >>     const char* arrow_filename = "test.arrow";
> >>
> >> Our Python documentation uses the following extensions:
> >>
> >>   * vnd.apache.arrow.file: .arrow
> >>   * vnd.apache.arrow.stream: Not used (in-memory buffer is used)
> >>
> >>
> https://github.com/apache/arrow/blob/master/docs/source/python/filesystems.rst
> >>
> >>    with local.open_output_stream("test.arrow") as file:
> >>
> >> Our Go tests use the following extensions:
> >>
> >>   * vnd.apache.arrow.file: Not used (no extension)
> >>   * vnd.apache.arrow.stream: Not used (no extension)
> >>
> >> Our Java tests use the following extensions:
> >>
> >>   * vnd.apache.arrow.file: .arrow
> >>   * vnd.apache.arrow.stream: .arrow but most of tests use in-memory
> buffer
> >>
> >>
> https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/ipc/TestArrowFile.java#L51
> >>
> >>     File file = new File("target/mytest_write.arrow");
> >>
> >>
> https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/ipc/TestRoundTrip.java#L176
> >>
> >>     final File temp = File.createTempFile("arrow-test-" + name + "-",
> ".arrow");
> >>
> >> Our JavaScript tests use the following extensions:
> >>
> >>   * vnd.apache.arrow.file: Not used (in-memory buffer is used)
> >>   * vnd.apache.arrow.stream: Not used (in-memory buffer is used)
> >>
> >> Our Julia tests use the following extensions:
> >>
> >>   * vnd.apache.arrow.file: Not used (in-memory buffer is used)
> >>   * vnd.apache.arrow.stream: Not used (in-memory buffer is used)
> >>
> >> Our Rust tests use the following extensions:
> >>
> >>   * vnd.apache.arrow.file: .arrow_file
> >>   * vnd.apache.arrow.stream: .stream
> >>
> >> Note that they use data in our integration test.
> >>
> >>
> >> Thanks,
> >> --
> >> kou
> >>
> >> In <CA...@mail.gmail.com>
> >>   "Re: Please Review: Application for a Media Type" on Fri, 22 Jan 2021
> 14:37:35 -0600,
> >>   Wes McKinney <we...@gmail.com> wrote:
> >>
> >> > Thank you for taking the lead on this. I gave a brief read through and
> >> > I think it makes sense using Thrift or Protocol Buffers as a
> >> > guideline. Would be good for some others to review who might be
> >> > familiar with IANA media formats
> >> >
> >> > On Wed, Jan 20, 2021 at 6:17 PM Weston Pace <we...@gmail.com>
> wrote:
> >> >>
> >> >> Per a previous discussion
> >> >> (
> https://lists.apache.org/thread.html/b15726d0c0da2223ba1b45a226ef86263f688b20532a30535cd5e267%40%3Cdev.arrow.apache.org%3E
> )
> >> >> and the resulting JIRA issue ARROW-7396
> >> >> (https://issues.apache.org/jira/browse/ARROW-7396) there is a desire
> >> >> to register the arrow format with the IANA as a formal media type
> >> >> (actually two media types, one for the streaming format and one for
> >> >> the file format).
> >> >>
> >> >> The form for applying is here: https://www.iana.org/form/media-types
> >> >>
> >> >> I have created a draft registration document (link below).
> >> >>
> >> >> The only fields with any real flexibility are "Security
> >> >> Considerations", "Interoperability Considerations", and "Application
> >> >> Usage".  I reviewed the applications for XML, JSON, and Thrift and
> >> >> I've made a best attempt at these fields as well as posted examples
> >> >> from the other languages.  Please review and feel free to suggest
> >> >> changes.
> >> >>
> >> >>
> https://docs.google.com/document/d/1PmZFoSifV_TX4vXnv775WiOtqCgz5zLF5ryFRWio3HQ/edit?usp=sharing
> >> >>
> >> >> One we align on the content we should probably have a PMC member
> >> >> actually make the submission and be listed as contact person.
> >> >>
> >> >> Thanks,
> >> >>
> >> >> Weston Pace
> >> >> Ursa Computing
>

Re: Please Review: Application for a Media Type

Posted by Sutou Kouhei <ko...@clear-code.com>.
Hi,

Thanks for updating the draft.

I want to wait for at least a weak before we start a vote.
Does anyone have an opinion about file extension of Apache
Arrow format data? What do you think about ".arrow"?


Thanks,
--
kou

In <CA...@mail.gmail.com>
  "Re: Please Review: Application for a Media Type" on Wed, 21 Apr 2021 08:17:40 -1000,
  Weston Pace <we...@gmail.com> wrote:

> Thank you for reviewing.  I have added your suggestions to the draft.
> Are we ready for a vote?  If so I will clean up the comments and send
> out a clean version of the draft.
> 
> On Mon, Apr 19, 2021 at 3:10 PM Sutou Kouhei <ko...@clear-code.com> wrote:
>>
>> Hi,
>>
>> Sorry for not responding this...
>>
>> Weston, thanks for writing up the draft!
>> https://docs.google.com/document/d/1PmZFoSifV_TX4vXnv775WiOtqCgz5zLF5ryFRWio3HQ/edit?usp=sharing
>>
>> Here are items we need to discuss before we apply a media
>> type to IANA:
>>
>> 1. Interoperability Considerations
>>
>> Draft:
>>
>> > The Apache arrow format is intended to be a language
>> > independent columnar memory format for flat and
>> > hierarchical data.  It has been shown to work in a variety
>> > of languages and applications.  Arrow files can be
>> > provided in two different formats, a streaming format
>> > (vnd.apache.arrow.stream) and a random access format
>> > (vnd.apache.arrow.file).  Applications should be aware of
>> > which format they are processing as the two are not
>> > interchangeable.
>>
>> Note in draft:
>>
>> > Should we mention something like "applications should
>> > make sure to check the 'version' field to ensure they
>> > can process the file"?
>>
>> How about referring our format document for further
>> information instead of mention the 'version' field?
>> https://arrow.apache.org/docs/format/Columnar.html
>>
>> XML Media Types also refers the XML specification for
>> further information:
>>
>> https://tools.ietf.org/html/rfc7303#section-9.1
>>
>> > For further information, see Section 2.9 "Standalone
>> > Document Declaration" and Section 5 "Conformance" of [XML].
>>
>>
>> 2. File extension(s)
>>
>> Draft:
>>
>> > N/A
>>
>> Note in draft:
>>
>> > Again, there are no formal extensions that have been
>> > recommended before.  Do we want to introduce any?  I'm
>> > pretty sure this is in no way binding (and it's unlikely
>> > anyone will ever see it).
>>
>> I want recommended extensions to avoid spreading various
>> extensions for Apache Arrow formats.
>>
>> How about the followings?
>>
>>   * vnd.apache.arrow.file: .arrow
>>   * vnd.apache.arrow.stream: NA
>>     (Generally, this format isn't saved as file. This format
>>     is used for pipe, sending/receiving via socket and so on.)
>>
>> FYI: Here is a list that shows used extensions in our code
>> base.
>>
>> Our integration test uses the following extensions:
>>
>>   * vnd.apache.arrow.file: .arrow_file
>>   * vnd.apache.arrow.stream: .stream
>>
>> https://github.com/apache/arrow/blob/master/dev/archery/archery/integration/runner.py#L250-L257
>>
>>     log('-- Validating file')
>>     producer_file_path = os.path.join(
>>         gold_dir, "generated_" + test_case.name + ".arrow_file")
>>     consumer.validate(json_path, producer_file_path)
>>
>>     log('-- Validating stream')
>>     consumer_stream_path = os.path.join(
>>         gold_dir, "generated_" + test_case.name + ".stream")
>>
>> Our C++ tests use the following extensions:
>>
>>   * vnd.apache.arrow.file: Not used (in-memory buffer is used)
>>   * vnd.apache.arrow.stream: Not used (in-memory buffer is used)
>>
>> Our C++ examples use the following extensions:
>>
>>   * vnd.apache.arrow.file: .arrow
>>   * vnd.apache.arrow.stream: NA
>>
>> https://github.com/apache/arrow/blob/master/cpp/examples/minimal_build/example.cc#L34
>>
>>     const char* arrow_filename = "test.arrow";
>>
>> Our Python documentation uses the following extensions:
>>
>>   * vnd.apache.arrow.file: .arrow
>>   * vnd.apache.arrow.stream: Not used (in-memory buffer is used)
>>
>> https://github.com/apache/arrow/blob/master/docs/source/python/filesystems.rst
>>
>>    with local.open_output_stream("test.arrow") as file:
>>
>> Our Go tests use the following extensions:
>>
>>   * vnd.apache.arrow.file: Not used (no extension)
>>   * vnd.apache.arrow.stream: Not used (no extension)
>>
>> Our Java tests use the following extensions:
>>
>>   * vnd.apache.arrow.file: .arrow
>>   * vnd.apache.arrow.stream: .arrow but most of tests use in-memory buffer
>>
>> https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/ipc/TestArrowFile.java#L51
>>
>>     File file = new File("target/mytest_write.arrow");
>>
>> https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/ipc/TestRoundTrip.java#L176
>>
>>     final File temp = File.createTempFile("arrow-test-" + name + "-", ".arrow");
>>
>> Our JavaScript tests use the following extensions:
>>
>>   * vnd.apache.arrow.file: Not used (in-memory buffer is used)
>>   * vnd.apache.arrow.stream: Not used (in-memory buffer is used)
>>
>> Our Julia tests use the following extensions:
>>
>>   * vnd.apache.arrow.file: Not used (in-memory buffer is used)
>>   * vnd.apache.arrow.stream: Not used (in-memory buffer is used)
>>
>> Our Rust tests use the following extensions:
>>
>>   * vnd.apache.arrow.file: .arrow_file
>>   * vnd.apache.arrow.stream: .stream
>>
>> Note that they use data in our integration test.
>>
>>
>> Thanks,
>> --
>> kou
>>
>> In <CA...@mail.gmail.com>
>>   "Re: Please Review: Application for a Media Type" on Fri, 22 Jan 2021 14:37:35 -0600,
>>   Wes McKinney <we...@gmail.com> wrote:
>>
>> > Thank you for taking the lead on this. I gave a brief read through and
>> > I think it makes sense using Thrift or Protocol Buffers as a
>> > guideline. Would be good for some others to review who might be
>> > familiar with IANA media formats
>> >
>> > On Wed, Jan 20, 2021 at 6:17 PM Weston Pace <we...@gmail.com> wrote:
>> >>
>> >> Per a previous discussion
>> >> (https://lists.apache.org/thread.html/b15726d0c0da2223ba1b45a226ef86263f688b20532a30535cd5e267%40%3Cdev.arrow.apache.org%3E)
>> >> and the resulting JIRA issue ARROW-7396
>> >> (https://issues.apache.org/jira/browse/ARROW-7396) there is a desire
>> >> to register the arrow format with the IANA as a formal media type
>> >> (actually two media types, one for the streaming format and one for
>> >> the file format).
>> >>
>> >> The form for applying is here: https://www.iana.org/form/media-types
>> >>
>> >> I have created a draft registration document (link below).
>> >>
>> >> The only fields with any real flexibility are "Security
>> >> Considerations", "Interoperability Considerations", and "Application
>> >> Usage".  I reviewed the applications for XML, JSON, and Thrift and
>> >> I've made a best attempt at these fields as well as posted examples
>> >> from the other languages.  Please review and feel free to suggest
>> >> changes.
>> >>
>> >> https://docs.google.com/document/d/1PmZFoSifV_TX4vXnv775WiOtqCgz5zLF5ryFRWio3HQ/edit?usp=sharing
>> >>
>> >> One we align on the content we should probably have a PMC member
>> >> actually make the submission and be listed as contact person.
>> >>
>> >> Thanks,
>> >>
>> >> Weston Pace
>> >> Ursa Computing

Re: Please Review: Application for a Media Type

Posted by Weston Pace <we...@gmail.com>.
Thank you for reviewing.  I have added your suggestions to the draft.
Are we ready for a vote?  If so I will clean up the comments and send
out a clean version of the draft.

On Mon, Apr 19, 2021 at 3:10 PM Sutou Kouhei <ko...@clear-code.com> wrote:
>
> Hi,
>
> Sorry for not responding this...
>
> Weston, thanks for writing up the draft!
> https://docs.google.com/document/d/1PmZFoSifV_TX4vXnv775WiOtqCgz5zLF5ryFRWio3HQ/edit?usp=sharing
>
> Here are items we need to discuss before we apply a media
> type to IANA:
>
> 1. Interoperability Considerations
>
> Draft:
>
> > The Apache arrow format is intended to be a language
> > independent columnar memory format for flat and
> > hierarchical data.  It has been shown to work in a variety
> > of languages and applications.  Arrow files can be
> > provided in two different formats, a streaming format
> > (vnd.apache.arrow.stream) and a random access format
> > (vnd.apache.arrow.file).  Applications should be aware of
> > which format they are processing as the two are not
> > interchangeable.
>
> Note in draft:
>
> > Should we mention something like "applications should
> > make sure to check the 'version' field to ensure they
> > can process the file"?
>
> How about referring our format document for further
> information instead of mention the 'version' field?
> https://arrow.apache.org/docs/format/Columnar.html
>
> XML Media Types also refers the XML specification for
> further information:
>
> https://tools.ietf.org/html/rfc7303#section-9.1
>
> > For further information, see Section 2.9 "Standalone
> > Document Declaration" and Section 5 "Conformance" of [XML].
>
>
> 2. File extension(s)
>
> Draft:
>
> > N/A
>
> Note in draft:
>
> > Again, there are no formal extensions that have been
> > recommended before.  Do we want to introduce any?  I'm
> > pretty sure this is in no way binding (and it's unlikely
> > anyone will ever see it).
>
> I want recommended extensions to avoid spreading various
> extensions for Apache Arrow formats.
>
> How about the followings?
>
>   * vnd.apache.arrow.file: .arrow
>   * vnd.apache.arrow.stream: NA
>     (Generally, this format isn't saved as file. This format
>     is used for pipe, sending/receiving via socket and so on.)
>
> FYI: Here is a list that shows used extensions in our code
> base.
>
> Our integration test uses the following extensions:
>
>   * vnd.apache.arrow.file: .arrow_file
>   * vnd.apache.arrow.stream: .stream
>
> https://github.com/apache/arrow/blob/master/dev/archery/archery/integration/runner.py#L250-L257
>
>     log('-- Validating file')
>     producer_file_path = os.path.join(
>         gold_dir, "generated_" + test_case.name + ".arrow_file")
>     consumer.validate(json_path, producer_file_path)
>
>     log('-- Validating stream')
>     consumer_stream_path = os.path.join(
>         gold_dir, "generated_" + test_case.name + ".stream")
>
> Our C++ tests use the following extensions:
>
>   * vnd.apache.arrow.file: Not used (in-memory buffer is used)
>   * vnd.apache.arrow.stream: Not used (in-memory buffer is used)
>
> Our C++ examples use the following extensions:
>
>   * vnd.apache.arrow.file: .arrow
>   * vnd.apache.arrow.stream: NA
>
> https://github.com/apache/arrow/blob/master/cpp/examples/minimal_build/example.cc#L34
>
>     const char* arrow_filename = "test.arrow";
>
> Our Python documentation uses the following extensions:
>
>   * vnd.apache.arrow.file: .arrow
>   * vnd.apache.arrow.stream: Not used (in-memory buffer is used)
>
> https://github.com/apache/arrow/blob/master/docs/source/python/filesystems.rst
>
>    with local.open_output_stream("test.arrow") as file:
>
> Our Go tests use the following extensions:
>
>   * vnd.apache.arrow.file: Not used (no extension)
>   * vnd.apache.arrow.stream: Not used (no extension)
>
> Our Java tests use the following extensions:
>
>   * vnd.apache.arrow.file: .arrow
>   * vnd.apache.arrow.stream: .arrow but most of tests use in-memory buffer
>
> https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/ipc/TestArrowFile.java#L51
>
>     File file = new File("target/mytest_write.arrow");
>
> https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/ipc/TestRoundTrip.java#L176
>
>     final File temp = File.createTempFile("arrow-test-" + name + "-", ".arrow");
>
> Our JavaScript tests use the following extensions:
>
>   * vnd.apache.arrow.file: Not used (in-memory buffer is used)
>   * vnd.apache.arrow.stream: Not used (in-memory buffer is used)
>
> Our Julia tests use the following extensions:
>
>   * vnd.apache.arrow.file: Not used (in-memory buffer is used)
>   * vnd.apache.arrow.stream: Not used (in-memory buffer is used)
>
> Our Rust tests use the following extensions:
>
>   * vnd.apache.arrow.file: .arrow_file
>   * vnd.apache.arrow.stream: .stream
>
> Note that they use data in our integration test.
>
>
> Thanks,
> --
> kou
>
> In <CA...@mail.gmail.com>
>   "Re: Please Review: Application for a Media Type" on Fri, 22 Jan 2021 14:37:35 -0600,
>   Wes McKinney <we...@gmail.com> wrote:
>
> > Thank you for taking the lead on this. I gave a brief read through and
> > I think it makes sense using Thrift or Protocol Buffers as a
> > guideline. Would be good for some others to review who might be
> > familiar with IANA media formats
> >
> > On Wed, Jan 20, 2021 at 6:17 PM Weston Pace <we...@gmail.com> wrote:
> >>
> >> Per a previous discussion
> >> (https://lists.apache.org/thread.html/b15726d0c0da2223ba1b45a226ef86263f688b20532a30535cd5e267%40%3Cdev.arrow.apache.org%3E)
> >> and the resulting JIRA issue ARROW-7396
> >> (https://issues.apache.org/jira/browse/ARROW-7396) there is a desire
> >> to register the arrow format with the IANA as a formal media type
> >> (actually two media types, one for the streaming format and one for
> >> the file format).
> >>
> >> The form for applying is here: https://www.iana.org/form/media-types
> >>
> >> I have created a draft registration document (link below).
> >>
> >> The only fields with any real flexibility are "Security
> >> Considerations", "Interoperability Considerations", and "Application
> >> Usage".  I reviewed the applications for XML, JSON, and Thrift and
> >> I've made a best attempt at these fields as well as posted examples
> >> from the other languages.  Please review and feel free to suggest
> >> changes.
> >>
> >> https://docs.google.com/document/d/1PmZFoSifV_TX4vXnv775WiOtqCgz5zLF5ryFRWio3HQ/edit?usp=sharing
> >>
> >> One we align on the content we should probably have a PMC member
> >> actually make the submission and be listed as contact person.
> >>
> >> Thanks,
> >>
> >> Weston Pace
> >> Ursa Computing