You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@incubator.apache.org by Steve Lawrence <st...@gmail.com> on 2017/08/01 12:09:26 UTC

Re: [DISCUSS] Daffodil Incubation Proposal

Discussions have died down, and I think the consensus from the responses
is that the issues are 1) the lack of committers and 2) the lack of a
champion and mentors. We hope to address #1 and grow the community as
part of incubation. Is anyone interested in being a champion or mentor
and help us with #2?

Thanks,
- Steve

On 07/26/2017 04:06 PM, Chris Mattmann wrote:
> This sounds like a very interesting project. 
> 
> I don’t have the time to mentor at the moment but I will keep a close eye on it.
> 
> Cheers,
> Chris Mattmann
> 
> 
> 
> 
> On 7/25/17, 11:53 AM, "McHenry, Kenton Guadron" <mc...@illinois.edu> wrote:
> 
>     Hi Dave,
>     
>     The developers that were at NCSA have moved on to other organizations.  While we still leverage Daffodil and are very much interested in seeing it move forward, development is currently done by the Tresys team.  Agreed on the synergy with Tika.
>     
>     Kenton McHenry, Ph.D.
>     Principal Research Scientist, Adjunct Assistant Professor of Computer Science
>     Deputy Director of the Scientific Software & Applications Division
>     National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign
>     
>     On Jul 24, 2017, at 1:55 PM, Dave Fisher <da...@comcast.net>> wrote:
>     
>     Hi Kenton,
>     
>     Is there any reason that you and others from the NCSA are not Initial Committers? That would make this proposal stronger.
>     
>     Regarding Apache Tika - it relies on other projects including Apache POI and Apache PDFBox. They are pragmatic about what is used. If Daffodil works to expand then I think that there would be good synergy between the projects. I know as a POI PMC member that the POI community has significantly benefited from the Tika community some of whom are from Mitre.
>     
>     To date Tika has not emphasized structured data, although they do extract content from Excel and OpenOffice.
>     
>     I am intrigued.
>     
>     Regards,
>     Dave
>     
>     On Jul 24, 2017, at 10:55 AM, McHenry, Kenton Guadron <mc...@illinois.edu>> wrote:
>     
>     Yes, DFDL and its open source implementation Daffodil are more about file formats and getting access to the entirety of a file's contents in a consistent way through machine readable specifications.  The work has implications in the area of digital preservation allowing one to preserve these machine readable specifications rather than all the tools needed to open/save a file in order to work with it.  Imagine someone developing graphics software to work with 3D models and not having to worry about the hundreds of formats out there for 3D meshes (whether there are tools for opening the files and whether they can get access to those tools, whether the spec is available and worrying about how complex that spec is to implement, etc.), and simply building their code around the contents (e.g. vertices, faces, etc.).  One could come up with similar scenarios for other data types (documents, images, videos, audio, depth data, numeric data).  Ideally tools built supporting DFDL, could someday, support any format for that type without the developer having to worry about the details of how that data is represented within a file.
>     
>     Kenton McHenry, Ph.D.
>     Principal Research Scientist, Adjunct Assistant Professor of Computer Science
>     Deputy Director of the Scientific Software & Applications Division
>     National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign
>     
>     On Jul 24, 2017, at 10:30 AM, Steve Lawrence <st...@gmail.com>> wrote:
>     
>     I'll preface this saying that I don't have a ton of experience with
>     Apache Tika. But based on my understanding, Tika and Daffodil do have
>     somewhat similar goals, but reach them in different ways. For example,
>     Tika requires that one writes /code/ to perform data extraction, usually
>     relying on existing Java libraries to extract the desired metadata. The
>     downside to this is that code can be buggy, and libraries might not even
>     exist for formats of interest (especially common with legacy and
>     military data).
>     
>     Daffodil, on the other hand, does not require one to write any code.
>     Instead, one writes a DFDL Schema (similar to XML Schema, with DFDL
>     annotations) that fully describes the data, which Daffodil then uses to
>     convert the data to XML/JSON for extraction. So adding support for a new
>     format means writing a new schema rather than new code. And less code
>     generally means less bugs. Also, for secure systems that require
>     certification, generally speaking, it is easier to certify a schema as
>     compared to code.
>     
>     We certainly don't believe that Daffodil could replace Tika, but it does
>     have the potential to add new functionality to Tika for formats that do
>     not have existing libraries. One of our goals is to look into
>     integrating Daffodil support into tools like Tika. We'd love to hear
>     from Tika devs if this is something they'd be interested in.
>     
>     I'll also add that whereas Tika tends to focus primarily on metadata,
>     DFDL schemas usually describe an entire file format down to the byte, so
>     one can extract more than just meta data, including text and binary
>     data. Further differentiating, Daffodil has support for serializing data
>     (called unparse) from the XML/JSON representation, allowing one to
>     transform or filter data as well. We don't believe this feature is all
>     that applicable to Tika, but may be useful to other technologies such as
>     filtering or data fuzzing technologies.
>     
>     - Steve
>     
>     
>     On 07/24/2017 10:59 AM, Mike Drob wrote:
>     What is the relationship between Daffodil and something like Apache Tika's
>     extraction engine?
>     
>     On Mon, Jul 24, 2017 at 9:53 AM, Steve Lawrence <
>     stephen.d.lawrence@gmail.com<ma...@gmail.com>> wrote:
>     
>     Dear Apache Incubator Community,
>     
>     We would like to start a discussion around a proposal to bring Daffodil
>     into the Apache Incubator. Daffodil is a implementation of the DFDL
>     specification used to convert between fixed format data and XML/JSON.
>     
>     The draft proposal can be found in the wiki at the following URL:
>     
>     https://wiki.apache.org/incubator/DaffodilProposal
>     
>     We do not yet have a champion or mentors, but it was recommended that we
>     create a proposal and send it to this list to potentially find those
>     that might be interested. The text for the draft proposal is found
>     below. We look forward to your input.
>     
>     Thanks,
>     -Steve
>     
>     
>     = Daffodil Proposal =
>     
>     == Abstract ==
>     
>     Daffodil is an implementation of the Data Format Description Language
>     (DFDL) used to convert between fixed format data and XML/JSON.
>     
>     == Proposal ==
>     
>     The Data Format Description Language (DFDL) is a specification,
>     developed by the Open Grid Forum, capable of describing many data
>     formats, including both textual and binary, scientific and numeric,
>     legacy and modern, commercial record-oriented, and many industry and
>     military standards. It defines a language that is a subset of W3C XML
>     schema to describe the logical format of the data, and annotations
>     within the schema to describe the physical representation.
>     
>     Daffodil is an open source implementation of the DFDL specification that
>     uses these DFDL schemas to parse fixed format data into an infoset,
>     which is most commonly represented as either XML or JSON. This allows
>     the use of well-established XML or JSON technologies and libraries to
>     consume, inspect, and manipulate fixed format data in existing
>     solutions. Daffodil is also capable of the reverse by serializing or
>     "unparsing" an XML or JSON infoset back to the original data format.
>     
>     == Background ==
>     
>     Many different software solutions need to consume and manage data,
>     including data directed routing, databases, data analysis, data
>     cleansing, data visualizing, and more. A key aspect of such solutions is
>     the need to transform the data into an easily consumable format.
>     Usually, this means that for each unique data format, one develops a
>     tool that can read and extract the necessary information, often leading
>     to ad-hoc and data-format-specific description systems. Such systems are
>     often proprietary, not well tested, and incompatible, leading to vendor
>     lock-in, flawed software, and increased training costs. DFDL is a new
>     standard, with version 1.0 completed in October of 2016, that solves
>     these problems by defining an open standard to describe many different
>     data formats and how to parse and unparse between the data and XML/JSON.
>     
>     Two closed source implementations of DFDL currently exist. The first was
>     created by IBM and is now part of their IBM® Integration Bus product.
>     The second was created by the European Space Agency, called DFDL4S or
>     "DFDL for Space" targeted at the challenges of their satellite data
>     processing.
>     
>     Around 2005, Pacific Northwest National Lab created Defuddle, built as
>     an open source implementation and proof of concept of the draft DFDL
>     specification and a test bed to feed new concepts into specification
>     development. Primary development of Defuddle was eventually taken over
>     by the National Center for Supercomputing Applications (NCSA). However,
>     due to evolution of the DFDL specification and architectural and
>     performance issues with Defuddle, around 2009, NCSA restarted the
>     project with the new name of Daffodil, with a goal of implementing the
>     complete DFDL specification. Daffodil development continued at NCSA
>     until around 2012, at which point development slowed due to budget
>     limitations. Shortly thereafter, primary development was picked up by
>     Tresys Technology where it continues today, with contributions from
>     other entities such as the Navy Research Lab, the Air Force Research
>     Lab, MITRE, and Booz Allen Hamilton. In February of 2015, Daffodil
>     version 1.0.0 was released, including support for the DFDL features
>     needed to parse many common file formats. Daffodil version 2.0.0 is
>     expected to be released in August of 2017, which will include unparse
>     support with one-to-one parsing feature parity.
>     
>     Entities including IBM, MITRE, NATO NCI Agency, Northrop-Grumman, Quark
>     Security, Raytheon, and Tresys Technology have developed DFDL schemas
>     for many data formats from varying technology domains, including PNG,
>     GIF, BMP, PCAP, HL7, EDIFACT, NACHA, vCard, iCalendar, and MIL-STD-2045,
>     many of which are publicly available on the DFDL Schemas github. There
>     are also a number of military-application data formats, the
>     specifications of which are not public, which have historically been
>     very difficult and expensive to process, and for which DFDL schemas have
>     been created or are actively in development; these include
>     MIL-STD-6040/USMTF ATO, MIL-STD-6017/VMF, MIL-STD-6016/NATO STANAG 5516
>     (aka "Link16").
>     
>     == Rationale ==
>     
>     Numerous software solutions exist that consume, inspect, analyze, and
>     transform data, many of which can be found in the Apache Software
>     Foundation (ASF). In order for tools like these to consume new types of
>     data, custom extensions are usually required, often with high
>     development and testing costs. Daffodil fills a clear gap in many of
>     these solutions, providing a simple and low cost way to transform data
>     to XML or JSON, which many of these tools natively support already. With
>     the upcoming 2.0.0 release, the Daffodil project will have achieved a
>     level of functionality in both parse and unparse that, when integrated
>     into existing solutions, could provide for a new method to quickly
>     enable support for new data formats.
>     
>     == Initial Goals ==
>     
>     * Relicense the existing code from the University of Illinois/NCSA Open
>     Source License to the Apache License version 2.0, working with Apache
>     Legal to ensure correctness, and with Daffodil contributors to get
>     their permission.
>     * Move the existing codebase, documentation, bugs, and mailing lists to
>     the Apache hosted infrastructure
>     * Establish a formal release process and schedule, allowing for
>     dependable release cycles in a manner consistent with the Apache
>     development process.
>     * Build relationships with ASF projects to add Daffodil support where
>     appropriate
>     * Grow the community to establish a diversity of background and expertise.
>     
>     == Current Status ==
>     
>     === Meritocracy ===
>     
>     All initial committers are familiar with the principles of meritocracy.
>     The Daffodil project has followed the model of meritocracy in the past,
>     providing multiple outside entities commit access based on the quality
>     of their contributions. In order to grow the Daffodil user base and
>     development community, we are dedicated to continuing to operate
>     Daffodil as a meritocracy.
>     
>     A key ingredient in a meritocracy of developers is open group code
>     review. The Daffodil project has operated in this mode throughout its
>     existence and this provides a forum to improve the code, verify code
>     quality, and educate new developers on the code base.
>     
>     === Community ===
>     
>     Daffodil has a small community of users and developers. Although primary
>     Daffodil development is done by Tresys Technology, a handful of other
>     contributions have come from other entities including the Navy Research
>     Lab, the Air Force Research Lab, MITRE, and Booz Allen Hamilton. In
>     addition to developers, multiple users of Daffodil have created DFDL
>     schemas, including entities such as MITRE, IBM, Raytheon, Quark
>     Security, and Tresys Technology. The DFDL Schemas github community has
>     been created as a place for DFDL schemas to be published. The Daffodil
>     project also makes use of mailing lists, !HipChat, and Confluence
>     Questions to build a community of users and system for support.
>     
>     === Core Developers ===
>     
>     The core developers of Daffodil are employed by Tresys Technology. We
>     will work to grow the community among a more diverse set of developers
>     and industries.
>     
>     === Alignment ===
>     
>     Daffodil was created as an open source project with a philosophy
>     consistent with The Apache Way. A strong belief in meritocracy,
>     community involvement in decisions, openness, and ensuring a high level
>     of quality in code, documentation, and testing are some of our shared
>     core beliefs.
>     
>     Further, as mentioned in the Rationale section, Daffodil fills a gap
>     that exists in many ASF projects, including !NiFi, Spark, Storm, Hadoop,
>     Tika, and others. In order for tools like these to consume new types of
>     data, custom extensions are usually required. Rather than create such
>     extensions, Daffodil provides an easy and standards-compliant way to
>     transform data to XML or JSON, which many of these tools already
>     natively support.
>     
>     == Known Risks ==
>     
>     === Orphaned Products ===
>     
>     The current core developers are the leading contributors in the space of
>     DFDL and wish to see it flourish. Though there is some risk that the
>     initial committers all come from the same company, a goal of entering
>     into incubation is to grow the development community to minimize the
>     risk of reliance on a single company.
>     
>     === Inexperience with Open Source ===
>     
>     The Daffodil project began as an open source project and has continued
>     that model throughout development. This includes public bug tracking,
>     git revision control, automated builds and tests, and a public wiki for
>     documentation.
>     
>     Additionally, the current core developers and initial committers all
>     work for a company that relies on, believes in, promotes, and has led or
>     contributed to many open source software projects, including SELinux
>     Userspace, OpenSCAP, CLIP, refpolicy, setools, RPM, and others. As such,
>     there is low risk related to inexperience with open source software and
>     processes.
>     
>     === Homogeneous Developers ===
>     
>     The proposed initial committers come from a single entity, though we are
>     committed to growing the Daffodil development community to include a
>     broad group of additional committers from a wide array of industries.
>     
>     === Reliance on Salaried Developers ===
>     
>     The proposed initial committers are paid by their employer to contribute
>     to the Daffodil project. We expect that Daffodil development will
>     continue with salaried developers, and are committed to growing the
>     community to include non-salaried developers as well.
>     
>     === Relationship with other Apache Projects ===
>     
>     As mentioned in the Alignment section, Daffodil fills a clear gap in
>     numerous other ASF projects that consume and manage large amounts of data.
>     
>     As a specific example, Daffodil developers have created a Daffodil
>     Apache !NiFi Processor, currently in use in data transfer solutions,
>     which allows one to ingest non-native data into an Apache !NiFi pipeline
>     as XML or JSON. This processor was well received by the Apache !NiFi
>     developers, with positive comments about the concise API and how it
>     could handle non-native data. Daffodil developers have also successfully
>     prototyped integration with Apache Spark. We believe Daffodil could
>     provide a strong benefit to many other ASF projects that handle fixed
>     format data. We anticipate working closely with such ASF projects to
>     include Daffodil where applicable to increase their ability to support
>     new data formats with minimal effort.
>     
>     Daffodil also depends on existing ASF projects, including Apache Commons
>     and Apache Xerces.
>     
>     === An Excessive Fascination with the Apache Brand ===
>     
>     Although the Apache brand may certainly help to attract more
>     contributors, publicity is not the reason for this proposal. We believe
>     Daffodil could provide a great benefit to the ASF and the numerous data
>     focused projects that comprise it, as described in the Rationale and
>     Alignment sections. We hope to build a strong and vibrant community
>     built around The Apache Way, and not dependent on a single company.
>     
>     === Documentation ===
>     
>     Daffodil documentation can be found at:
>     
>     *
>     https://opensource.ncsa.illinois.edu/confluence/
>     display/DFDL/Daffodil%3A+Open+Source+DFDL
>     
>     Information about DFDL can be found at:
>     
>     * https://www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
>     *
>     https://www.ibm.com/support/knowledgecenter/en/SSMKHH_9.0.
>     0/com.ibm.etools.mft.doc/df20060_.htm
>     
>     Public examples of DFDL Schemas can be found at:
>     
>     * https://github.com/DFDLSchemas
>     
>     == Initial Source ==
>     
>     The Daffodil git repo goes back to mid-2011 with approximately 20
>     different contributors and feedback from many users and developers. The
>     core codebase is written in Scala and includes both a Scala and Java
>     API, along with Javadocs and Scaladocs for API usage. The initial code
>     will come from the git repository currently hosted by NCSA at the
>     University of Illinois :
>     
>     https://opensource.ncsa.illinois.edu/bitbucket/
>     projects/DFDL/repos/daffodil/
>     
>     == Source and Intellectual Property Submission ==
>     
>     The complete Daffodil code is licensed under the University of
>     Illinois/NCSA Open Source License. Much of the current codebase has been
>     developed by Tresys Technology, who is open to relicensing the code to
>     the Apache License version 2.0 and donate the source to the ASF.
>     Contacts at NCSA are also open to relicensing their contributions to
>     Apache v2. We plan to contact the other contributors and ask for
>     permission to relicense and donate their contributed code. For those
>     that decline or we cannot contact, their code will be removed or
>     replaced. We will work closely with Apache Legal to ensure all issues
>     related to relicensing are acceptable.
>     
>     == External Dependencies ==
>     
>     We believe all current dependencies are compatible with the ASF
>     guidelines. Our dependency licenses come from the following license
>     styles: Apache v2, BSD, MIT, and ICU. The list of current Daffodil
>     dependencies and their licenses are documented here:
>     
>     https://opensource.ncsa.illinois.edu/confluence/
>     display/DFDL/Dependencies+and+Licenses
>     
>     == Cryptography ==
>     
>     None
>     
>     == Required Resources ==
>     
>     === Mailing Lists ===
>     
>     * commits@daffodil.incubator.apache.org
>     * dev@daffodil.incubator.apache.org
>     * private@daffodil.incubator.apache.org
>     * user@daffodil.incubator.apache.org
>     
>     === Source Control ===
>     
>     git://git.apache.org/incubator-daffodil.git
>     
>     === Issue Tracking ===
>     
>     JIRA Daffodil (DFDL)
>     
>     === Initial Committers ===
>     
>     * Beth Finnegan <efinnegan at tresys dot com>
>     * Dave Thompson <dthompson at tresys dot com>
>     * Josh Adams <jadams at tresys dot com>
>     * Mike Beckerle <mbeckerle at tresys dot com>
>     * Steve Lawrence <slawrence at tresys dot com>
>     * Taylor Wise <twise at tresys dot com>
>     
>     === Affiliations ===
>     
>     * Beth Finnegan (Tresys Technology)
>     * Dave Thompson (Tresys Technology)
>     * Josh Adams (Tresys Technology)
>     * Mike Beckerle (Tresys Technology)
>     * Steve Lawrence (Tresys Technology)
>     * Taylor Wise (Tresys Technology)
>     
>     == Sponsors ==
>     
>     === Champion ===
>     
>     * TBD
>     
>     === Nominated Mentors ===
>     
>     * TBD
>     
>     === Sponsoring Entity ===
>     
>     We request the Apache Incubator to sponsor this project.
>     
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>     For additional commands, e-mail: general-help@incubator.apache.org
>     
>     
>     
>     
>     
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org<ma...@incubator.apache.org>
>     For additional commands, e-mail: general-help@incubator.apache.org<ma...@incubator.apache.org>
>     
>     
>     
>     
>     
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>

Re: [DISCUSS] Daffodil Incubation Proposal

Posted by "John D. Ament" <jo...@apache.org>.
Sorry, only responded to one part :/

You can start the vote as well.  Feel free to follow the format used at
https://lists.apache.org/thread.html/2da6f1920aa7d9f0ee9edbd2a4e6a8e0e5db9aac40e503fd87a4cdb0@%3Cgeneral.incubator.apache.org%3E

If you have any questions, respond here or privately.

John

On Wed, Aug 9, 2017 at 10:19 PM John D. Ament <jo...@apache.org> wrote:

> Steve,
>
> You could list either of us.
>
> John
>
>
> On Wed, Aug 9, 2017 at 11:55 AM Steve Lawrence <
> stephen.d.lawrence@gmail.com> wrote:
>
>> Sounds good to me. Can I start a vote, or is something a champion/mentor
>> would normally start? The project also does not have a champion--is that
>> necessary/would either of you be interested in being the champion?
>>
>> Thanks,
>> - Steve
>>
>> On 08/08/2017 10:59 PM, Dave Fisher wrote:
>> > Hi -
>> >
>> > I agree. I'm willing to proceed with John and I as Mentors.
>> >
>> > Regards,
>> > Dave
>> >
>> > Sent from my iPhone
>> >
>> >> On Aug 8, 2017, at 7:10 PM, John D. Ament <jo...@apache.org>
>> wrote:
>> >>
>> >> Steve,
>> >>
>> >> At this point, I'd recommend we wrap the discussion and call for a
>> vote.  While ideally we want 3 mentors, we can get started with 2 and see
>> how things progress.
>> >>
>> >> John
>> >>
>> >>> On Wed, Aug 2, 2017 at 3:55 PM Steve Lawrence <
>> stephen.d.lawrence@gmail.com> wrote:
>> >>> Thanks John!
>> >>>
>> >>> On 08/02/2017 03:23 PM, John D. Ament wrote:
>> >>>> You can also count me in as a mentor.
>> >>>>
>> >>>> John
>> >>>>
>> >>>> On Wed, Aug 2, 2017 at 3:14 PM Steve Lawrence <
>> stephen.d.lawrence@gmail.com>
>> >>>> wrote:
>> >>>>
>> >>>>> Understood. Thanks for the interest!
>> >>>>>
>> >>>>> - Steve
>> >>>>>
>> >>>>> On 08/02/2017 02:57 PM, Dave Fisher wrote:
>> >>>>>> Hi Steve,
>> >>>>>>
>> >>>>>> It was not so much the lack of committers as it was the current
>> >>>>> diversity. That is not a blocker for entry to Incubation.
>> >>>>>>
>> >>>>>> I am willing to be one of the Mentors. Once there are at least two
>> more
>> >>>>> we can push forward.
>> >>>>>>
>> >>>>>> Regards,
>> >>>>>> Dave
>> >>>>>>
>> >>>>>>> On Aug 1, 2017, at 5:09 AM, Steve Lawrence <
>> >>>>> stephen.d.lawrence@gmail.com> wrote:
>> >>>>>>>
>> >>>>>>> Discussions have died down, and I think the consensus from the
>> responses
>> >>>>>>> is that the issues are 1) the lack of committers and 2) the lack
>> of a
>> >>>>>>> champion and mentors. We hope to address #1 and grow the
>> community as
>> >>>>>>> part of incubation. Is anyone interested in being a champion or
>> mentor
>> >>>>>>> and help us with #2?
>> >>>>>>>
>> >>>>>>> Thanks,
>> >>>>>>> - Steve
>> >>>>>>>
>> >>>>>>> On 07/26/2017 04:06 PM, Chris Mattmann wrote:
>> >>>>>>>> This sounds like a very interesting project.
>> >>>>>>>>
>> >>>>>>>> I don’t have the time to mentor at the moment but I will keep a
>> close
>> >>>>> eye on it.
>> >>>>>>>>
>> >>>>>>>> Cheers,
>> >>>>>>>> Chris Mattmann
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> On 7/25/17, 11:53 AM, "McHenry, Kenton Guadron" <
>> mchenry@illinois.edu>
>> >>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>    Hi Dave,
>> >>>>>>>>
>> >>>>>>>>    The developers that were at NCSA have moved on to other
>> >>>>> organizations.  While we still leverage Daffodil and are very much
>> >>>>> interested in seeing it move forward, development is currently done
>> by the
>> >>>>> Tresys team.  Agreed on the synergy with Tika.
>> >>>>>>>>
>> >>>>>>>>    Kenton McHenry, Ph.D.
>> >>>>>>>>    Principal Research Scientist, Adjunct Assistant Professor of
>> >>>>> Computer Science
>> >>>>>>>>    Deputy Director of the Scientific Software & Applications
>> Division
>> >>>>>>>>    National Center for Supercomputing Applications, University of
>> >>>>> Illinois at Urbana-Champaign
>> >>>>>>>>
>> >>>>>>>>    On Jul 24, 2017, at 1:55 PM, Dave Fisher <
>> dave2wave@comcast.net
>> >>>>> <ma...@comcast.net>> wrote:
>> >>>>>>>>
>> >>>>>>>>    Hi Kenton,
>> >>>>>>>>
>> >>>>>>>>    Is there any reason that you and others from the NCSA are not
>> >>>>> Initial Committers? That would make this proposal stronger.
>> >>>>>>>>
>> >>>>>>>>    Regarding Apache Tika - it relies on other projects including
>> >>>>> Apache POI and Apache PDFBox. They are pragmatic about what is
>> used. If
>> >>>>> Daffodil works to expand then I think that there would be good
>> synergy
>> >>>>> between the projects. I know as a POI PMC member that the POI
>> community has
>> >>>>> significantly benefited from the Tika community some of whom are
>> from Mitre.
>> >>>>>>>>
>> >>>>>>>>    To date Tika has not emphasized structured data, although
>> they do
>> >>>>> extract content from Excel and OpenOffice.
>> >>>>>>>>
>> >>>>>>>>    I am intrigued.
>> >>>>>>>>
>> >>>>>>>>    Regards,
>> >>>>>>>>    Dave
>> >>>>>>>>
>> >>>>>>>>    On Jul 24, 2017, at 10:55 AM, McHenry, Kenton Guadron <
>> >>>>> mchenry@illinois.edu<ma...@illinois.edu>> wrote:
>> >>>>>>>>
>> >>>>>>>>    Yes, DFDL and its open source implementation Daffodil are more
>> >>>>> about file formats and getting access to the entirety of a file's
>> contents
>> >>>>> in a consistent way through machine readable specifications.  The
>> work has
>> >>>>> implications in the area of digital preservation allowing one to
>> preserve
>> >>>>> these machine readable specifications rather than all the tools
>> needed to
>> >>>>> open/save a file in order to work with it.  Imagine someone
>> developing
>> >>>>> graphics software to work with 3D models and not having to worry
>> about the
>> >>>>> hundreds of formats out there for 3D meshes (whether there are
>> tools for
>> >>>>> opening the files and whether they can get access to those tools,
>> whether
>> >>>>> the spec is available and worrying about how complex that spec is to
>> >>>>> implement, etc.), and simply building their code around the
>> contents (e.g.
>> >>>>> vertices, faces, etc.).  One could come up with similar scenarios
>> for other
>> >>>>> data types (documents, images, videos, audio, depth data, numeric
>> data).
>> >>>>> Ideally tools built supporting DFDL, could someday, support any
>> format for
>> >>>>> that type without the developer having to worry about the details
>> of how
>> >>>>> that data is represented within a file.
>> >>>>>>>>
>> >>>>>>>>    Kenton McHenry, Ph.D.
>> >>>>>>>>    Principal Research Scientist, Adjunct Assistant Professor of
>> >>>>> Computer Science
>> >>>>>>>>    Deputy Director of the Scientific Software & Applications
>> Division
>> >>>>>>>>    National Center for Supercomputing Applications, University of
>> >>>>> Illinois at Urbana-Champaign
>> >>>>>>>>
>> >>>>>>>>    On Jul 24, 2017, at 10:30 AM, Steve Lawrence <
>> >>>>> stephen.d.lawrence@gmail.com<mailto:stephen.d.lawrence@gmail.com
>> ><mailto:
>> >>>>> stephen.d.lawrence@gmail.com>> wrote:
>> >>>>>>>>
>> >>>>>>>>    I'll preface this saying that I don't have a ton of
>> experience with
>> >>>>>>>>    Apache Tika. But based on my understanding, Tika and Daffodil
>> do
>> >>>>> have
>> >>>>>>>>    somewhat similar goals, but reach them in different ways. For
>> >>>>> example,
>> >>>>>>>>    Tika requires that one writes /code/ to perform data
>> extraction,
>> >>>>> usually
>> >>>>>>>>    relying on existing Java libraries to extract the desired
>> metadata.
>> >>>>> The
>> >>>>>>>>    downside to this is that code can be buggy, and libraries
>> might not
>> >>>>> even
>> >>>>>>>>    exist for formats of interest (especially common with legacy
>> and
>> >>>>>>>>    military data).
>> >>>>>>>>
>> >>>>>>>>    Daffodil, on the other hand, does not require one to write
>> any code.
>> >>>>>>>>    Instead, one writes a DFDL Schema (similar to XML Schema,
>> with DFDL
>> >>>>>>>>    annotations) that fully describes the data, which Daffodil
>> then
>> >>>>> uses to
>> >>>>>>>>    convert the data to XML/JSON for extraction. So adding
>> support for
>> >>>>> a new
>> >>>>>>>>    format means writing a new schema rather than new code. And
>> less
>> >>>>> code
>> >>>>>>>>    generally means less bugs. Also, for secure systems that
>> require
>> >>>>>>>>    certification, generally speaking, it is easier to certify a
>> schema
>> >>>>> as
>> >>>>>>>>    compared to code.
>> >>>>>>>>
>> >>>>>>>>    We certainly don't believe that Daffodil could replace Tika,
>> but it
>> >>>>> does
>> >>>>>>>>    have the potential to add new functionality to Tika for
>> formats
>> >>>>> that do
>> >>>>>>>>    not have existing libraries. One of our goals is to look into
>> >>>>>>>>    integrating Daffodil support into tools like Tika. We'd love
>> to hear
>> >>>>>>>>    from Tika devs if this is something they'd be interested in.
>> >>>>>>>>
>> >>>>>>>>    I'll also add that whereas Tika tends to focus primarily on
>> >>>>> metadata,
>> >>>>>>>>    DFDL schemas usually describe an entire file format down to
>> the
>> >>>>> byte, so
>> >>>>>>>>    one can extract more than just meta data, including text and
>> binary
>> >>>>>>>>    data. Further differentiating, Daffodil has support for
>> serializing
>> >>>>> data
>> >>>>>>>>    (called unparse) from the XML/JSON representation, allowing
>> one to
>> >>>>>>>>    transform or filter data as well. We don't believe this
>> feature is
>> >>>>> all
>> >>>>>>>>    that applicable to Tika, but may be useful to other
>> technologies
>> >>>>> such as
>> >>>>>>>>    filtering or data fuzzing technologies.
>> >>>>>>>>
>> >>>>>>>>    - Steve
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>    On 07/24/2017 10:59 AM, Mike Drob wrote:
>> >>>>>>>>    What is the relationship between Daffodil and something like
>> Apache
>> >>>>> Tika's
>> >>>>>>>>    extraction engine?
>> >>>>>>>>
>> >>>>>>>>    On Mon, Jul 24, 2017 at 9:53 AM, Steve Lawrence <
>> >>>>>>>>    stephen.d.lawrence@gmail.com<mailto:
>> stephen.d.lawrence@gmail.com
>> >>>>>> <ma...@gmail.com>> wrote:
>> >>>>>>>>
>> >>>>>>>>    Dear Apache Incubator Community,
>> >>>>>>>>
>> >>>>>>>>    We would like to start a discussion around a proposal to bring
>> >>>>> Daffodil
>> >>>>>>>>    into the Apache Incubator. Daffodil is a implementation of
>> the DFDL
>> >>>>>>>>    specification used to convert between fixed format data and
>> >>>>> XML/JSON.
>> >>>>>>>>
>> >>>>>>>>    The draft proposal can be found in the wiki at the following
>> URL:
>> >>>>>>>>
>> >>>>>>>>    https://wiki.apache.org/incubator/DaffodilProposal
>> >>>>>>>>
>> >>>>>>>>    We do not yet have a champion or mentors, but it was
>> recommended
>> >>>>> that we
>> >>>>>>>>    create a proposal and send it to this list to potentially
>> find those
>> >>>>>>>>    that might be interested. The text for the draft proposal is
>> found
>> >>>>>>>>    below. We look forward to your input.
>> >>>>>>>>
>> >>>>>>>>    Thanks,
>> >>>>>>>>    -Steve
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>    = Daffodil Proposal =
>> >>>>>>>>
>> >>>>>>>>    == Abstract ==
>> >>>>>>>>
>> >>>>>>>>    Daffodil is an implementation of the Data Format Description
>> >>>>> Language
>> >>>>>>>>    (DFDL) used to convert between fixed format data and XML/JSON.
>> >>>>>>>>
>> >>>>>>>>    == Proposal ==
>> >>>>>>>>
>> >>>>>>>>    The Data Format Description Language (DFDL) is a
>> specification,
>> >>>>>>>>    developed by the Open Grid Forum, capable of describing many
>> data
>> >>>>>>>>    formats, including both textual and binary, scientific and
>> numeric,
>> >>>>>>>>    legacy and modern, commercial record-oriented, and many
>> industry and
>> >>>>>>>>    military standards. It defines a language that is a subset of
>> W3C
>> >>>>> XML
>> >>>>>>>>    schema to describe the logical format of the data, and
>> annotations
>> >>>>>>>>    within the schema to describe the physical representation.
>> >>>>>>>>
>> >>>>>>>>    Daffodil is an open source implementation of the DFDL
>> specification
>> >>>>> that
>> >>>>>>>>    uses these DFDL schemas to parse fixed format data into an
>> infoset,
>> >>>>>>>>    which is most commonly represented as either XML or JSON. This
>> >>>>> allows
>> >>>>>>>>    the use of well-established XML or JSON technologies and
>> libraries
>> >>>>> to
>> >>>>>>>>    consume, inspect, and manipulate fixed format data in existing
>> >>>>>>>>    solutions. Daffodil is also capable of the reverse by
>> serializing or
>> >>>>>>>>    "unparsing" an XML or JSON infoset back to the original data
>> format.
>> >>>>>>>>
>> >>>>>>>>    == Background ==
>> >>>>>>>>
>> >>>>>>>>    Many different software solutions need to consume and manage
>> data,
>> >>>>>>>>    including data directed routing, databases, data analysis,
>> data
>> >>>>>>>>    cleansing, data visualizing, and more. A key aspect of such
>> >>>>> solutions is
>> >>>>>>>>    the need to transform the data into an easily consumable
>> format.
>> >>>>>>>>    Usually, this means that for each unique data format, one
>> develops a
>> >>>>>>>>    tool that can read and extract the necessary information,
>> often
>> >>>>> leading
>> >>>>>>>>    to ad-hoc and data-format-specific description systems. Such
>> >>>>> systems are
>> >>>>>>>>    often proprietary, not well tested, and incompatible, leading
>> to
>> >>>>> vendor
>> >>>>>>>>    lock-in, flawed software, and increased training costs. DFDL
>> is a
>> >>>>> new
>> >>>>>>>>    standard, with version 1.0 completed in October of 2016, that
>> solves
>> >>>>>>>>    these problems by defining an open standard to describe many
>> >>>>> different
>> >>>>>>>>    data formats and how to parse and unparse between the data and
>> >>>>> XML/JSON.
>> >>>>>>>>
>> >>>>>>>>    Two closed source implementations of DFDL currently exist. The
>> >>>>> first was
>> >>>>>>>>    created by IBM and is now part of their IBM® Integration Bus
>> >>>>> product.
>> >>>>>>>>    The second was created by the European Space Agency, called
>> DFDL4S
>> >>>>> or
>> >>>>>>>>    "DFDL for Space" targeted at the challenges of their
>> satellite data
>> >>>>>>>>    processing.
>> >>>>>>>>
>> >>>>>>>>    Around 2005, Pacific Northwest National Lab created Defuddle,
>> built
>> >>>>> as
>> >>>>>>>>    an open source implementation and proof of concept of the
>> draft DFDL
>> >>>>>>>>    specification and a test bed to feed new concepts into
>> specification
>> >>>>>>>>    development. Primary development of Defuddle was eventually
>> taken
>> >>>>> over
>> >>>>>>>>    by the National Center for Supercomputing Applications (NCSA).
>> >>>>> However,
>> >>>>>>>>    due to evolution of the DFDL specification and architectural
>> and
>> >>>>>>>>    performance issues with Defuddle, around 2009, NCSA restarted
>> the
>> >>>>>>>>    project with the new name of Daffodil, with a goal of
>> implementing
>> >>>>> the
>> >>>>>>>>    complete DFDL specification. Daffodil development continued
>> at NCSA
>> >>>>>>>>    until around 2012, at which point development slowed due to
>> budget
>> >>>>>>>>    limitations. Shortly thereafter, primary development was
>> picked up
>> >>>>> by
>> >>>>>>>>    Tresys Technology where it continues today, with
>> contributions from
>> >>>>>>>>    other entities such as the Navy Research Lab, the Air Force
>> Research
>> >>>>>>>>    Lab, MITRE, and Booz Allen Hamilton. In February of 2015,
>> Daffodil
>> >>>>>>>>    version 1.0.0 was released, including support for the DFDL
>> features
>> >>>>>>>>    needed to parse many common file formats. Daffodil version
>> 2.0.0 is
>> >>>>>>>>    expected to be released in August of 2017, which will include
>> >>>>> unparse
>> >>>>>>>>    support with one-to-one parsing feature parity.
>> >>>>>>>>
>> >>>>>>>>    Entities including IBM, MITRE, NATO NCI Agency,
>> Northrop-Grumman,
>> >>>>> Quark
>> >>>>>>>>    Security, Raytheon, and Tresys Technology have developed DFDL
>> >>>>> schemas
>> >>>>>>>>    for many data formats from varying technology domains,
>> including
>> >>>>> PNG,
>> >>>>>>>>    GIF, BMP, PCAP, HL7, EDIFACT, NACHA, vCard, iCalendar, and
>> >>>>> MIL-STD-2045,
>> >>>>>>>>    many of which are publicly available on the DFDL Schemas
>> github.
>> >>>>> There
>> >>>>>>>>    are also a number of military-application data formats, the
>> >>>>>>>>    specifications of which are not public, which have
>> historically been
>> >>>>>>>>    very difficult and expensive to process, and for which DFDL
>> schemas
>> >>>>> have
>> >>>>>>>>    been created or are actively in development; these include
>> >>>>>>>>    MIL-STD-6040/USMTF ATO, MIL-STD-6017/VMF, MIL-STD-6016/NATO
>> STANAG
>> >>>>> 5516
>> >>>>>>>>    (aka "Link16").
>> >>>>>>>>
>> >>>>>>>>    == Rationale ==
>> >>>>>>>>
>> >>>>>>>>    Numerous software solutions exist that consume, inspect,
>> analyze,
>> >>>>> and
>> >>>>>>>>    transform data, many of which can be found in the Apache
>> Software
>> >>>>>>>>    Foundation (ASF). In order for tools like these to consume new
>> >>>>> types of
>> >>>>>>>>    data, custom extensions are usually required, often with high
>> >>>>>>>>    development and testing costs. Daffodil fills a clear gap in
>> many of
>> >>>>>>>>    these solutions, providing a simple and low cost way to
>> transform
>> >>>>> data
>> >>>>>>>>    to XML or JSON, which many of these tools natively support
>> already.
>> >>>>> With
>> >>>>>>>>    the upcoming 2.0.0 release, the Daffodil project will have
>> achieved
>> >>>>> a
>> >>>>>>>>    level of functionality in both parse and unparse that, when
>> >>>>> integrated
>> >>>>>>>>    into existing solutions, could provide for a new method to
>> quickly
>> >>>>>>>>    enable support for new data formats.
>> >>>>>>>>
>> >>>>>>>>    == Initial Goals ==
>> >>>>>>>>
>> >>>>>>>>    * Relicense the existing code from the University of
>> Illinois/NCSA
>> >>>>> Open
>> >>>>>>>>    Source License to the Apache License version 2.0, working with
>> >>>>> Apache
>> >>>>>>>>    Legal to ensure correctness, and with Daffodil contributors
>> to get
>> >>>>>>>>    their permission.
>> >>>>>>>>    * Move the existing codebase, documentation, bugs, and mailing
>> >>>>> lists to
>> >>>>>>>>    the Apache hosted infrastructure
>> >>>>>>>>    * Establish a formal release process and schedule, allowing
>> for
>> >>>>>>>>    dependable release cycles in a manner consistent with the
>> Apache
>> >>>>>>>>    development process.
>> >>>>>>>>    * Build relationships with ASF projects to add Daffodil
>> support
>> >>>>> where
>> >>>>>>>>    appropriate
>> >>>>>>>>    * Grow the community to establish a diversity of background
>> and
>> >>>>> expertise.
>> >>>>>>>>
>> >>>>>>>>    == Current Status ==
>> >>>>>>>>
>> >>>>>>>>    === Meritocracy ===
>> >>>>>>>>
>> >>>>>>>>    All initial committers are familiar with the principles of
>> >>>>> meritocracy.
>> >>>>>>>>    The Daffodil project has followed the model of meritocracy in
>> the
>> >>>>> past,
>> >>>>>>>>    providing multiple outside entities commit access based on the
>> >>>>> quality
>> >>>>>>>>    of their contributions. In order to grow the Daffodil user
>> base and
>> >>>>>>>>    development community, we are dedicated to continuing to
>> operate
>> >>>>>>>>    Daffodil as a meritocracy.
>> >>>>>>>>
>> >>>>>>>>    A key ingredient in a meritocracy of developers is open group
>> code
>> >>>>>>>>    review. The Daffodil project has operated in this mode
>> throughout
>> >>>>> its
>> >>>>>>>>    existence and this provides a forum to improve the code,
>> verify code
>> >>>>>>>>    quality, and educate new developers on the code base.
>> >>>>>>>>
>> >>>>>>>>    === Community ===
>> >>>>>>>>
>> >>>>>>>>    Daffodil has a small community of users and developers.
>> Although
>> >>>>> primary
>> >>>>>>>>    Daffodil development is done by Tresys Technology, a handful
>> of
>> >>>>> other
>> >>>>>>>>    contributions have come from other entities including the Navy
>> >>>>> Research
>> >>>>>>>>    Lab, the Air Force Research Lab, MITRE, and Booz Allen
>> Hamilton. In
>> >>>>>>>>    addition to developers, multiple users of Daffodil have
>> created DFDL
>> >>>>>>>>    schemas, including entities such as MITRE, IBM, Raytheon,
>> Quark
>> >>>>>>>>    Security, and Tresys Technology. The DFDL Schemas github
>> community
>> >>>>> has
>> >>>>>>>>    been created as a place for DFDL schemas to be published. The
>> >>>>> Daffodil
>> >>>>>>>>    project also makes use of mailing lists, !HipChat, and
>> Confluence
>> >>>>>>>>    Questions to build a community of users and system for
>> support.
>> >>>>>>>>
>> >>>>>>>>    === Core Developers ===
>> >>>>>>>>
>> >>>>>>>>    The core developers of Daffodil are employed by Tresys
>> Technology.
>> >>>>> We
>> >>>>>>>>    will work to grow the community among a more diverse set of
>> >>>>> developers
>> >>>>>>>>    and industries.
>> >>>>>>>>
>> >>>>>>>>    === Alignment ===
>> >>>>>>>>
>> >>>>>>>>    Daffodil was created as an open source project with a
>> philosophy
>> >>>>>>>>    consistent with The Apache Way. A strong belief in
>> meritocracy,
>> >>>>>>>>    community involvement in decisions, openness, and ensuring a
>> high
>> >>>>> level
>> >>>>>>>>    of quality in code, documentation, and testing are some of our
>> >>>>> shared
>> >>>>>>>>    core beliefs.
>> >>>>>>>>
>> >>>>>>>>    Further, as mentioned in the Rationale section, Daffodil
>> fills a gap
>> >>>>>>>>    that exists in many ASF projects, including !NiFi, Spark,
>> Storm,
>> >>>>> Hadoop,
>> >>>>>>>>    Tika, and others. In order for tools like these to consume new
>> >>>>> types of
>> >>>>>>>>    data, custom extensions are usually required. Rather than
>> create
>> >>>>> such
>> >>>>>>>>    extensions, Daffodil provides an easy and standards-compliant
>> way to
>> >>>>>>>>    transform data to XML or JSON, which many of these tools
>> already
>> >>>>>>>>    natively support.
>> >>>>>>>>
>> >>>>>>>>    == Known Risks ==
>> >>>>>>>>
>> >>>>>>>>    === Orphaned Products ===
>> >>>>>>>>
>> >>>>>>>>    The current core developers are the leading contributors in
>> the
>> >>>>> space of
>> >>>>>>>>    DFDL and wish to see it flourish. Though there is some risk
>> that the
>> >>>>>>>>    initial committers all come from the same company, a goal of
>> >>>>> entering
>> >>>>>>>>    into incubation is to grow the development community to
>> minimize the
>> >>>>>>>>    risk of reliance on a single company.
>> >>>>>>>>
>> >>>>>>>>    === Inexperience with Open Source ===
>> >>>>>>>>
>> >>>>>>>>    The Daffodil project began as an open source project and has
>> >>>>> continued
>> >>>>>>>>    that model throughout development. This includes public bug
>> >>>>> tracking,
>> >>>>>>>>    git revision control, automated builds and tests, and a
>> public wiki
>> >>>>> for
>> >>>>>>>>    documentation.
>> >>>>>>>>
>> >>>>>>>>    Additionally, the current core developers and initial
>> committers all
>> >>>>>>>>    work for a company that relies on, believes in, promotes, and
>> has
>> >>>>> led or
>> >>>>>>>>    contributed to many open source software projects, including
>> SELinux
>> >>>>>>>>    Userspace, OpenSCAP, CLIP, refpolicy, setools, RPM, and
>> others. As
>> >>>>> such,
>> >>>>>>>>    there is low risk related to inexperience with open source
>> software
>> >>>>> and
>> >>>>>>>>    processes.
>> >>>>>>>>
>> >>>>>>>>    === Homogeneous Developers ===
>> >>>>>>>>
>> >>>>>>>>    The proposed initial committers come from a single entity,
>> though
>> >>>>> we are
>> >>>>>>>>    committed to growing the Daffodil development community to
>> include a
>> >>>>>>>>    broad group of additional committers from a wide array of
>> >>>>> industries.
>> >>>>>>>>
>> >>>>>>>>    === Reliance on Salaried Developers ===
>> >>>>>>>>
>> >>>>>>>>    The proposed initial committers are paid by their employer to
>> >>>>> contribute
>> >>>>>>>>    to the Daffodil project. We expect that Daffodil development
>> will
>> >>>>>>>>    continue with salaried developers, and are committed to
>> growing the
>> >>>>>>>>    community to include non-salaried developers as well.
>> >>>>>>>>
>> >>>>>>>>    === Relationship with other Apache Projects ===
>> >>>>>>>>
>> >>>>>>>>    As mentioned in the Alignment section, Daffodil fills a clear
>> gap in
>> >>>>>>>>    numerous other ASF projects that consume and manage large
>> amounts
>> >>>>> of data.
>> >>>>>>>>
>> >>>>>>>>    As a specific example, Daffodil developers have created a
>> Daffodil
>> >>>>>>>>    Apache !NiFi Processor, currently in use in data transfer
>> solutions,
>> >>>>>>>>    which allows one to ingest non-native data into an Apache
>> !NiFi
>> >>>>> pipeline
>> >>>>>>>>    as XML or JSON. This processor was well received by the
>> Apache !NiFi
>> >>>>>>>>    developers, with positive comments about the concise API and
>> how it
>> >>>>>>>>    could handle non-native data. Daffodil developers have also
>> >>>>> successfully
>> >>>>>>>>    prototyped integration with Apache Spark. We believe Daffodil
>> could
>> >>>>>>>>    provide a strong benefit to many other ASF projects that
>> handle
>> >>>>> fixed
>> >>>>>>>>    format data. We anticipate working closely with such ASF
>> projects to
>> >>>>>>>>    include Daffodil where applicable to increase their ability to
>> >>>>> support
>> >>>>>>>>    new data formats with minimal effort.
>> >>>>>>>>
>> >>>>>>>>    Daffodil also depends on existing ASF projects, including
>> Apache
>> >>>>> Commons
>> >>>>>>>>    and Apache Xerces.
>> >>>>>>>>
>> >>>>>>>>    === An Excessive Fascination with the Apache Brand ===
>> >>>>>>>>
>> >>>>>>>>    Although the Apache brand may certainly help to attract more
>> >>>>>>>>    contributors, publicity is not the reason for this proposal.
>> We
>> >>>>> believe
>> >>>>>>>>    Daffodil could provide a great benefit to the ASF and the
>> numerous
>> >>>>> data
>> >>>>>>>>    focused projects that comprise it, as described in the
>> Rationale and
>> >>>>>>>>    Alignment sections. We hope to build a strong and vibrant
>> community
>> >>>>>>>>    built around The Apache Way, and not dependent on a single
>> company.
>> >>>>>>>>
>> >>>>>>>>    === Documentation ===
>> >>>>>>>>
>> >>>>>>>>    Daffodil documentation can be found at:
>> >>>>>>>>
>> >>>>>>>>    *
>> >>>>>>>>    https://opensource.ncsa.illinois.edu/confluence/
>> >>>>>>>>    display/DFDL/Daffodil%3A+Open+Source+DFDL
>> >>>>>>>>
>> >>>>>>>>    Information about DFDL can be found at:
>> >>>>>>>>
>> >>>>>>>>    * https://www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
>> >>>>>>>>    *
>> >>>>>>>>    https://www.ibm.com/support/knowledgecenter/en/SSMKHH_9.0.
>> >>>>>>>>    0/com.ibm.etools.mft.doc/df20060_.htm
>> >>>>>>>>
>> >>>>>>>>    Public examples of DFDL Schemas can be found at:
>> >>>>>>>>
>> >>>>>>>>    * https://github.com/DFDLSchemas
>> >>>>>>>>
>> >>>>>>>>    == Initial Source ==
>> >>>>>>>>
>> >>>>>>>>    The Daffodil git repo goes back to mid-2011 with
>> approximately 20
>> >>>>>>>>    different contributors and feedback from many users and
>> developers.
>> >>>>> The
>> >>>>>>>>    core codebase is written in Scala and includes both a Scala
>> and Java
>> >>>>>>>>    API, along with Javadocs and Scaladocs for API usage. The
>> initial
>> >>>>> code
>> >>>>>>>>    will come from the git repository currently hosted by NCSA at
>> the
>> >>>>>>>>    University of Illinois :
>> >>>>>>>>
>> >>>>>>>>    https://opensource.ncsa.illinois.edu/bitbucket/
>> >>>>>>>>    projects/DFDL/repos/daffodil/
>> >>>>>>>>
>> >>>>>>>>    == Source and Intellectual Property Submission ==
>> >>>>>>>>
>> >>>>>>>>    The complete Daffodil code is licensed under the University of
>> >>>>>>>>    Illinois/NCSA Open Source License. Much of the current
>> codebase has
>> >>>>> been
>> >>>>>>>>    developed by Tresys Technology, who is open to relicensing
>> the code
>> >>>>> to
>> >>>>>>>>    the Apache License version 2.0 and donate the source to the
>> ASF.
>> >>>>>>>>    Contacts at NCSA are also open to relicensing their
>> contributions to
>> >>>>>>>>    Apache v2. We plan to contact the other contributors and ask
>> for
>> >>>>>>>>    permission to relicense and donate their contributed code.
>> For those
>> >>>>>>>>    that decline or we cannot contact, their code will be removed
>> or
>> >>>>>>>>    replaced. We will work closely with Apache Legal to ensure all
>> >>>>> issues
>> >>>>>>>>    related to relicensing are acceptable.
>> >>>>>>>>
>> >>>>>>>>    == External Dependencies ==
>> >>>>>>>>
>> >>>>>>>>    We believe all current dependencies are compatible with the
>> ASF
>> >>>>>>>>    guidelines. Our dependency licenses come from the following
>> license
>> >>>>>>>>    styles: Apache v2, BSD, MIT, and ICU. The list of current
>> Daffodil
>> >>>>>>>>    dependencies and their licenses are documented here:
>> >>>>>>>>
>> >>>>>>>>    https://opensource.ncsa.illinois.edu/confluence/
>> >>>>>>>>    display/DFDL/Dependencies+and+Licenses
>> >>>>>>>>
>> >>>>>>>>    == Cryptography ==
>> >>>>>>>>
>> >>>>>>>>    None
>> >>>>>>>>
>> >>>>>>>>    == Required Resources ==
>> >>>>>>>>
>> >>>>>>>>    === Mailing Lists ===
>> >>>>>>>>
>> >>>>>>>>    * commits@daffodil.incubator.apache.org
>> >>>>>>>>    * dev@daffodil.incubator.apache.org
>> >>>>>>>>    * private@daffodil.incubator.apache.org
>> >>>>>>>>    * user@daffodil.incubator.apache.org
>> >>>>>>>>
>> >>>>>>>>    === Source Control ===
>> >>>>>>>>
>> >>>>>>>>    git://git.apache.org/incubator-daffodil.git
>> >>>>>>>>
>> >>>>>>>>    === Issue Tracking ===
>> >>>>>>>>
>> >>>>>>>>    JIRA Daffodil (DFDL)
>> >>>>>>>>
>> >>>>>>>>    === Initial Committers ===
>> >>>>>>>>
>> >>>>>>>>    * Beth Finnegan <efinnegan at tresys dot com>
>> >>>>>>>>    * Dave Thompson <dthompson at tresys dot com>
>> >>>>>>>>    * Josh Adams <jadams at tresys dot com>
>> >>>>>>>>    * Mike Beckerle <mbeckerle at tresys dot com>
>> >>>>>>>>    * Steve Lawrence <slawrence at tresys dot com>
>> >>>>>>>>    * Taylor Wise <twise at tresys dot com>
>> >>>>>>>>
>> >>>>>>>>    === Affiliations ===
>> >>>>>>>>
>> >>>>>>>>    * Beth Finnegan (Tresys Technology)
>> >>>>>>>>    * Dave Thompson (Tresys Technology)
>> >>>>>>>>    * Josh Adams (Tresys Technology)
>> >>>>>>>>    * Mike Beckerle (Tresys Technology)
>> >>>>>>>>    * Steve Lawrence (Tresys Technology)
>> >>>>>>>>    * Taylor Wise (Tresys Technology)
>> >>>>>>>>
>> >>>>>>>>    == Sponsors ==
>> >>>>>>>>
>> >>>>>>>>    === Champion ===
>> >>>>>>>>
>> >>>>>>>>    * TBD
>> >>>>>>>>
>> >>>>>>>>    === Nominated Mentors ===
>> >>>>>>>>
>> >>>>>>>>    * TBD
>> >>>>>>>>
>> >>>>>>>>    === Sponsoring Entity ===
>> >>>>>>>>
>> >>>>>>>>    We request the Apache Incubator to sponsor this project.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>
>> ---------------------------------------------------------------------
>> >>>>>>>>    To unsubscribe, e-mail:
>> general-unsubscribe@incubator.apache.org
>> >>>>>>>>    For additional commands, e-mail:
>> general-help@incubator.apache.org
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>
>> ---------------------------------------------------------------------
>> >>>>>>>>    To unsubscribe, e-mail:
>> general-unsubscribe@incubator.apache.org
>> >>>>> <ma...@incubator.apache.org>
>> >>>>>>>>    For additional commands, e-mail:
>> general-help@incubator.apache.org
>> >>>>> <ma...@incubator.apache.org>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> ---------------------------------------------------------------------
>> >>>>>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> >>>>>>>> For additional commands, e-mail:
>> general-help@incubator.apache.org
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> ---------------------------------------------------------------------
>> >>>>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> >>>>>>> For additional commands, e-mail:
>> general-help@incubator.apache.org
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> ---------------------------------------------------------------------
>> >>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> >>>>> For additional commands, e-mail: general-help@incubator.apache.org
>> >>>>>
>> >>>>
>> >>>
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> >>> For additional commands, e-mail: general-help@incubator.apache.org
>> >>>
>> >
>>
>>

Re: [DISCUSS] Daffodil Incubation Proposal

Posted by "John D. Ament" <jo...@apache.org>.
Steve,

You could list either of us.

John

On Wed, Aug 9, 2017 at 11:55 AM Steve Lawrence <st...@gmail.com>
wrote:

> Sounds good to me. Can I start a vote, or is something a champion/mentor
> would normally start? The project also does not have a champion--is that
> necessary/would either of you be interested in being the champion?
>
> Thanks,
> - Steve
>
> On 08/08/2017 10:59 PM, Dave Fisher wrote:
> > Hi -
> >
> > I agree. I'm willing to proceed with John and I as Mentors.
> >
> > Regards,
> > Dave
> >
> > Sent from my iPhone
> >
> >> On Aug 8, 2017, at 7:10 PM, John D. Ament <jo...@apache.org>
> wrote:
> >>
> >> Steve,
> >>
> >> At this point, I'd recommend we wrap the discussion and call for a
> vote.  While ideally we want 3 mentors, we can get started with 2 and see
> how things progress.
> >>
> >> John
> >>
> >>> On Wed, Aug 2, 2017 at 3:55 PM Steve Lawrence <
> stephen.d.lawrence@gmail.com> wrote:
> >>> Thanks John!
> >>>
> >>> On 08/02/2017 03:23 PM, John D. Ament wrote:
> >>>> You can also count me in as a mentor.
> >>>>
> >>>> John
> >>>>
> >>>> On Wed, Aug 2, 2017 at 3:14 PM Steve Lawrence <
> stephen.d.lawrence@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> Understood. Thanks for the interest!
> >>>>>
> >>>>> - Steve
> >>>>>
> >>>>> On 08/02/2017 02:57 PM, Dave Fisher wrote:
> >>>>>> Hi Steve,
> >>>>>>
> >>>>>> It was not so much the lack of committers as it was the current
> >>>>> diversity. That is not a blocker for entry to Incubation.
> >>>>>>
> >>>>>> I am willing to be one of the Mentors. Once there are at least two
> more
> >>>>> we can push forward.
> >>>>>>
> >>>>>> Regards,
> >>>>>> Dave
> >>>>>>
> >>>>>>> On Aug 1, 2017, at 5:09 AM, Steve Lawrence <
> >>>>> stephen.d.lawrence@gmail.com> wrote:
> >>>>>>>
> >>>>>>> Discussions have died down, and I think the consensus from the
> responses
> >>>>>>> is that the issues are 1) the lack of committers and 2) the lack
> of a
> >>>>>>> champion and mentors. We hope to address #1 and grow the community
> as
> >>>>>>> part of incubation. Is anyone interested in being a champion or
> mentor
> >>>>>>> and help us with #2?
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> - Steve
> >>>>>>>
> >>>>>>> On 07/26/2017 04:06 PM, Chris Mattmann wrote:
> >>>>>>>> This sounds like a very interesting project.
> >>>>>>>>
> >>>>>>>> I don’t have the time to mentor at the moment but I will keep a
> close
> >>>>> eye on it.
> >>>>>>>>
> >>>>>>>> Cheers,
> >>>>>>>> Chris Mattmann
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 7/25/17, 11:53 AM, "McHenry, Kenton Guadron" <
> mchenry@illinois.edu>
> >>>>> wrote:
> >>>>>>>>
> >>>>>>>>    Hi Dave,
> >>>>>>>>
> >>>>>>>>    The developers that were at NCSA have moved on to other
> >>>>> organizations.  While we still leverage Daffodil and are very much
> >>>>> interested in seeing it move forward, development is currently done
> by the
> >>>>> Tresys team.  Agreed on the synergy with Tika.
> >>>>>>>>
> >>>>>>>>    Kenton McHenry, Ph.D.
> >>>>>>>>    Principal Research Scientist, Adjunct Assistant Professor of
> >>>>> Computer Science
> >>>>>>>>    Deputy Director of the Scientific Software & Applications
> Division
> >>>>>>>>    National Center for Supercomputing Applications, University of
> >>>>> Illinois at Urbana-Champaign
> >>>>>>>>
> >>>>>>>>    On Jul 24, 2017, at 1:55 PM, Dave Fisher <
> dave2wave@comcast.net
> >>>>> <ma...@comcast.net>> wrote:
> >>>>>>>>
> >>>>>>>>    Hi Kenton,
> >>>>>>>>
> >>>>>>>>    Is there any reason that you and others from the NCSA are not
> >>>>> Initial Committers? That would make this proposal stronger.
> >>>>>>>>
> >>>>>>>>    Regarding Apache Tika - it relies on other projects including
> >>>>> Apache POI and Apache PDFBox. They are pragmatic about what is used.
> If
> >>>>> Daffodil works to expand then I think that there would be good
> synergy
> >>>>> between the projects. I know as a POI PMC member that the POI
> community has
> >>>>> significantly benefited from the Tika community some of whom are
> from Mitre.
> >>>>>>>>
> >>>>>>>>    To date Tika has not emphasized structured data, although they
> do
> >>>>> extract content from Excel and OpenOffice.
> >>>>>>>>
> >>>>>>>>    I am intrigued.
> >>>>>>>>
> >>>>>>>>    Regards,
> >>>>>>>>    Dave
> >>>>>>>>
> >>>>>>>>    On Jul 24, 2017, at 10:55 AM, McHenry, Kenton Guadron <
> >>>>> mchenry@illinois.edu<ma...@illinois.edu>> wrote:
> >>>>>>>>
> >>>>>>>>    Yes, DFDL and its open source implementation Daffodil are more
> >>>>> about file formats and getting access to the entirety of a file's
> contents
> >>>>> in a consistent way through machine readable specifications.  The
> work has
> >>>>> implications in the area of digital preservation allowing one to
> preserve
> >>>>> these machine readable specifications rather than all the tools
> needed to
> >>>>> open/save a file in order to work with it.  Imagine someone
> developing
> >>>>> graphics software to work with 3D models and not having to worry
> about the
> >>>>> hundreds of formats out there for 3D meshes (whether there are tools
> for
> >>>>> opening the files and whether they can get access to those tools,
> whether
> >>>>> the spec is available and worrying about how complex that spec is to
> >>>>> implement, etc.), and simply building their code around the contents
> (e.g.
> >>>>> vertices, faces, etc.).  One could come up with similar scenarios
> for other
> >>>>> data types (documents, images, videos, audio, depth data, numeric
> data).
> >>>>> Ideally tools built supporting DFDL, could someday, support any
> format for
> >>>>> that type without the developer having to worry about the details of
> how
> >>>>> that data is represented within a file.
> >>>>>>>>
> >>>>>>>>    Kenton McHenry, Ph.D.
> >>>>>>>>    Principal Research Scientist, Adjunct Assistant Professor of
> >>>>> Computer Science
> >>>>>>>>    Deputy Director of the Scientific Software & Applications
> Division
> >>>>>>>>    National Center for Supercomputing Applications, University of
> >>>>> Illinois at Urbana-Champaign
> >>>>>>>>
> >>>>>>>>    On Jul 24, 2017, at 10:30 AM, Steve Lawrence <
> >>>>> stephen.d.lawrence@gmail.com<mailto:stephen.d.lawrence@gmail.com
> ><mailto:
> >>>>> stephen.d.lawrence@gmail.com>> wrote:
> >>>>>>>>
> >>>>>>>>    I'll preface this saying that I don't have a ton of experience
> with
> >>>>>>>>    Apache Tika. But based on my understanding, Tika and Daffodil
> do
> >>>>> have
> >>>>>>>>    somewhat similar goals, but reach them in different ways. For
> >>>>> example,
> >>>>>>>>    Tika requires that one writes /code/ to perform data
> extraction,
> >>>>> usually
> >>>>>>>>    relying on existing Java libraries to extract the desired
> metadata.
> >>>>> The
> >>>>>>>>    downside to this is that code can be buggy, and libraries
> might not
> >>>>> even
> >>>>>>>>    exist for formats of interest (especially common with legacy
> and
> >>>>>>>>    military data).
> >>>>>>>>
> >>>>>>>>    Daffodil, on the other hand, does not require one to write any
> code.
> >>>>>>>>    Instead, one writes a DFDL Schema (similar to XML Schema, with
> DFDL
> >>>>>>>>    annotations) that fully describes the data, which Daffodil then
> >>>>> uses to
> >>>>>>>>    convert the data to XML/JSON for extraction. So adding support
> for
> >>>>> a new
> >>>>>>>>    format means writing a new schema rather than new code. And
> less
> >>>>> code
> >>>>>>>>    generally means less bugs. Also, for secure systems that
> require
> >>>>>>>>    certification, generally speaking, it is easier to certify a
> schema
> >>>>> as
> >>>>>>>>    compared to code.
> >>>>>>>>
> >>>>>>>>    We certainly don't believe that Daffodil could replace Tika,
> but it
> >>>>> does
> >>>>>>>>    have the potential to add new functionality to Tika for formats
> >>>>> that do
> >>>>>>>>    not have existing libraries. One of our goals is to look into
> >>>>>>>>    integrating Daffodil support into tools like Tika. We'd love
> to hear
> >>>>>>>>    from Tika devs if this is something they'd be interested in.
> >>>>>>>>
> >>>>>>>>    I'll also add that whereas Tika tends to focus primarily on
> >>>>> metadata,
> >>>>>>>>    DFDL schemas usually describe an entire file format down to the
> >>>>> byte, so
> >>>>>>>>    one can extract more than just meta data, including text and
> binary
> >>>>>>>>    data. Further differentiating, Daffodil has support for
> serializing
> >>>>> data
> >>>>>>>>    (called unparse) from the XML/JSON representation, allowing
> one to
> >>>>>>>>    transform or filter data as well. We don't believe this
> feature is
> >>>>> all
> >>>>>>>>    that applicable to Tika, but may be useful to other
> technologies
> >>>>> such as
> >>>>>>>>    filtering or data fuzzing technologies.
> >>>>>>>>
> >>>>>>>>    - Steve
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>    On 07/24/2017 10:59 AM, Mike Drob wrote:
> >>>>>>>>    What is the relationship between Daffodil and something like
> Apache
> >>>>> Tika's
> >>>>>>>>    extraction engine?
> >>>>>>>>
> >>>>>>>>    On Mon, Jul 24, 2017 at 9:53 AM, Steve Lawrence <
> >>>>>>>>    stephen.d.lawrence@gmail.com<mailto:
> stephen.d.lawrence@gmail.com
> >>>>>> <ma...@gmail.com>> wrote:
> >>>>>>>>
> >>>>>>>>    Dear Apache Incubator Community,
> >>>>>>>>
> >>>>>>>>    We would like to start a discussion around a proposal to bring
> >>>>> Daffodil
> >>>>>>>>    into the Apache Incubator. Daffodil is a implementation of the
> DFDL
> >>>>>>>>    specification used to convert between fixed format data and
> >>>>> XML/JSON.
> >>>>>>>>
> >>>>>>>>    The draft proposal can be found in the wiki at the following
> URL:
> >>>>>>>>
> >>>>>>>>    https://wiki.apache.org/incubator/DaffodilProposal
> >>>>>>>>
> >>>>>>>>    We do not yet have a champion or mentors, but it was
> recommended
> >>>>> that we
> >>>>>>>>    create a proposal and send it to this list to potentially find
> those
> >>>>>>>>    that might be interested. The text for the draft proposal is
> found
> >>>>>>>>    below. We look forward to your input.
> >>>>>>>>
> >>>>>>>>    Thanks,
> >>>>>>>>    -Steve
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>    = Daffodil Proposal =
> >>>>>>>>
> >>>>>>>>    == Abstract ==
> >>>>>>>>
> >>>>>>>>    Daffodil is an implementation of the Data Format Description
> >>>>> Language
> >>>>>>>>    (DFDL) used to convert between fixed format data and XML/JSON.
> >>>>>>>>
> >>>>>>>>    == Proposal ==
> >>>>>>>>
> >>>>>>>>    The Data Format Description Language (DFDL) is a specification,
> >>>>>>>>    developed by the Open Grid Forum, capable of describing many
> data
> >>>>>>>>    formats, including both textual and binary, scientific and
> numeric,
> >>>>>>>>    legacy and modern, commercial record-oriented, and many
> industry and
> >>>>>>>>    military standards. It defines a language that is a subset of
> W3C
> >>>>> XML
> >>>>>>>>    schema to describe the logical format of the data, and
> annotations
> >>>>>>>>    within the schema to describe the physical representation.
> >>>>>>>>
> >>>>>>>>    Daffodil is an open source implementation of the DFDL
> specification
> >>>>> that
> >>>>>>>>    uses these DFDL schemas to parse fixed format data into an
> infoset,
> >>>>>>>>    which is most commonly represented as either XML or JSON. This
> >>>>> allows
> >>>>>>>>    the use of well-established XML or JSON technologies and
> libraries
> >>>>> to
> >>>>>>>>    consume, inspect, and manipulate fixed format data in existing
> >>>>>>>>    solutions. Daffodil is also capable of the reverse by
> serializing or
> >>>>>>>>    "unparsing" an XML or JSON infoset back to the original data
> format.
> >>>>>>>>
> >>>>>>>>    == Background ==
> >>>>>>>>
> >>>>>>>>    Many different software solutions need to consume and manage
> data,
> >>>>>>>>    including data directed routing, databases, data analysis, data
> >>>>>>>>    cleansing, data visualizing, and more. A key aspect of such
> >>>>> solutions is
> >>>>>>>>    the need to transform the data into an easily consumable
> format.
> >>>>>>>>    Usually, this means that for each unique data format, one
> develops a
> >>>>>>>>    tool that can read and extract the necessary information, often
> >>>>> leading
> >>>>>>>>    to ad-hoc and data-format-specific description systems. Such
> >>>>> systems are
> >>>>>>>>    often proprietary, not well tested, and incompatible, leading
> to
> >>>>> vendor
> >>>>>>>>    lock-in, flawed software, and increased training costs. DFDL
> is a
> >>>>> new
> >>>>>>>>    standard, with version 1.0 completed in October of 2016, that
> solves
> >>>>>>>>    these problems by defining an open standard to describe many
> >>>>> different
> >>>>>>>>    data formats and how to parse and unparse between the data and
> >>>>> XML/JSON.
> >>>>>>>>
> >>>>>>>>    Two closed source implementations of DFDL currently exist. The
> >>>>> first was
> >>>>>>>>    created by IBM and is now part of their IBM® Integration Bus
> >>>>> product.
> >>>>>>>>    The second was created by the European Space Agency, called
> DFDL4S
> >>>>> or
> >>>>>>>>    "DFDL for Space" targeted at the challenges of their satellite
> data
> >>>>>>>>    processing.
> >>>>>>>>
> >>>>>>>>    Around 2005, Pacific Northwest National Lab created Defuddle,
> built
> >>>>> as
> >>>>>>>>    an open source implementation and proof of concept of the
> draft DFDL
> >>>>>>>>    specification and a test bed to feed new concepts into
> specification
> >>>>>>>>    development. Primary development of Defuddle was eventually
> taken
> >>>>> over
> >>>>>>>>    by the National Center for Supercomputing Applications (NCSA).
> >>>>> However,
> >>>>>>>>    due to evolution of the DFDL specification and architectural
> and
> >>>>>>>>    performance issues with Defuddle, around 2009, NCSA restarted
> the
> >>>>>>>>    project with the new name of Daffodil, with a goal of
> implementing
> >>>>> the
> >>>>>>>>    complete DFDL specification. Daffodil development continued at
> NCSA
> >>>>>>>>    until around 2012, at which point development slowed due to
> budget
> >>>>>>>>    limitations. Shortly thereafter, primary development was
> picked up
> >>>>> by
> >>>>>>>>    Tresys Technology where it continues today, with contributions
> from
> >>>>>>>>    other entities such as the Navy Research Lab, the Air Force
> Research
> >>>>>>>>    Lab, MITRE, and Booz Allen Hamilton. In February of 2015,
> Daffodil
> >>>>>>>>    version 1.0.0 was released, including support for the DFDL
> features
> >>>>>>>>    needed to parse many common file formats. Daffodil version
> 2.0.0 is
> >>>>>>>>    expected to be released in August of 2017, which will include
> >>>>> unparse
> >>>>>>>>    support with one-to-one parsing feature parity.
> >>>>>>>>
> >>>>>>>>    Entities including IBM, MITRE, NATO NCI Agency,
> Northrop-Grumman,
> >>>>> Quark
> >>>>>>>>    Security, Raytheon, and Tresys Technology have developed DFDL
> >>>>> schemas
> >>>>>>>>    for many data formats from varying technology domains,
> including
> >>>>> PNG,
> >>>>>>>>    GIF, BMP, PCAP, HL7, EDIFACT, NACHA, vCard, iCalendar, and
> >>>>> MIL-STD-2045,
> >>>>>>>>    many of which are publicly available on the DFDL Schemas
> github.
> >>>>> There
> >>>>>>>>    are also a number of military-application data formats, the
> >>>>>>>>    specifications of which are not public, which have
> historically been
> >>>>>>>>    very difficult and expensive to process, and for which DFDL
> schemas
> >>>>> have
> >>>>>>>>    been created or are actively in development; these include
> >>>>>>>>    MIL-STD-6040/USMTF ATO, MIL-STD-6017/VMF, MIL-STD-6016/NATO
> STANAG
> >>>>> 5516
> >>>>>>>>    (aka "Link16").
> >>>>>>>>
> >>>>>>>>    == Rationale ==
> >>>>>>>>
> >>>>>>>>    Numerous software solutions exist that consume, inspect,
> analyze,
> >>>>> and
> >>>>>>>>    transform data, many of which can be found in the Apache
> Software
> >>>>>>>>    Foundation (ASF). In order for tools like these to consume new
> >>>>> types of
> >>>>>>>>    data, custom extensions are usually required, often with high
> >>>>>>>>    development and testing costs. Daffodil fills a clear gap in
> many of
> >>>>>>>>    these solutions, providing a simple and low cost way to
> transform
> >>>>> data
> >>>>>>>>    to XML or JSON, which many of these tools natively support
> already.
> >>>>> With
> >>>>>>>>    the upcoming 2.0.0 release, the Daffodil project will have
> achieved
> >>>>> a
> >>>>>>>>    level of functionality in both parse and unparse that, when
> >>>>> integrated
> >>>>>>>>    into existing solutions, could provide for a new method to
> quickly
> >>>>>>>>    enable support for new data formats.
> >>>>>>>>
> >>>>>>>>    == Initial Goals ==
> >>>>>>>>
> >>>>>>>>    * Relicense the existing code from the University of
> Illinois/NCSA
> >>>>> Open
> >>>>>>>>    Source License to the Apache License version 2.0, working with
> >>>>> Apache
> >>>>>>>>    Legal to ensure correctness, and with Daffodil contributors to
> get
> >>>>>>>>    their permission.
> >>>>>>>>    * Move the existing codebase, documentation, bugs, and mailing
> >>>>> lists to
> >>>>>>>>    the Apache hosted infrastructure
> >>>>>>>>    * Establish a formal release process and schedule, allowing for
> >>>>>>>>    dependable release cycles in a manner consistent with the
> Apache
> >>>>>>>>    development process.
> >>>>>>>>    * Build relationships with ASF projects to add Daffodil support
> >>>>> where
> >>>>>>>>    appropriate
> >>>>>>>>    * Grow the community to establish a diversity of background and
> >>>>> expertise.
> >>>>>>>>
> >>>>>>>>    == Current Status ==
> >>>>>>>>
> >>>>>>>>    === Meritocracy ===
> >>>>>>>>
> >>>>>>>>    All initial committers are familiar with the principles of
> >>>>> meritocracy.
> >>>>>>>>    The Daffodil project has followed the model of meritocracy in
> the
> >>>>> past,
> >>>>>>>>    providing multiple outside entities commit access based on the
> >>>>> quality
> >>>>>>>>    of their contributions. In order to grow the Daffodil user
> base and
> >>>>>>>>    development community, we are dedicated to continuing to
> operate
> >>>>>>>>    Daffodil as a meritocracy.
> >>>>>>>>
> >>>>>>>>    A key ingredient in a meritocracy of developers is open group
> code
> >>>>>>>>    review. The Daffodil project has operated in this mode
> throughout
> >>>>> its
> >>>>>>>>    existence and this provides a forum to improve the code,
> verify code
> >>>>>>>>    quality, and educate new developers on the code base.
> >>>>>>>>
> >>>>>>>>    === Community ===
> >>>>>>>>
> >>>>>>>>    Daffodil has a small community of users and developers.
> Although
> >>>>> primary
> >>>>>>>>    Daffodil development is done by Tresys Technology, a handful of
> >>>>> other
> >>>>>>>>    contributions have come from other entities including the Navy
> >>>>> Research
> >>>>>>>>    Lab, the Air Force Research Lab, MITRE, and Booz Allen
> Hamilton. In
> >>>>>>>>    addition to developers, multiple users of Daffodil have
> created DFDL
> >>>>>>>>    schemas, including entities such as MITRE, IBM, Raytheon, Quark
> >>>>>>>>    Security, and Tresys Technology. The DFDL Schemas github
> community
> >>>>> has
> >>>>>>>>    been created as a place for DFDL schemas to be published. The
> >>>>> Daffodil
> >>>>>>>>    project also makes use of mailing lists, !HipChat, and
> Confluence
> >>>>>>>>    Questions to build a community of users and system for support.
> >>>>>>>>
> >>>>>>>>    === Core Developers ===
> >>>>>>>>
> >>>>>>>>    The core developers of Daffodil are employed by Tresys
> Technology.
> >>>>> We
> >>>>>>>>    will work to grow the community among a more diverse set of
> >>>>> developers
> >>>>>>>>    and industries.
> >>>>>>>>
> >>>>>>>>    === Alignment ===
> >>>>>>>>
> >>>>>>>>    Daffodil was created as an open source project with a
> philosophy
> >>>>>>>>    consistent with The Apache Way. A strong belief in meritocracy,
> >>>>>>>>    community involvement in decisions, openness, and ensuring a
> high
> >>>>> level
> >>>>>>>>    of quality in code, documentation, and testing are some of our
> >>>>> shared
> >>>>>>>>    core beliefs.
> >>>>>>>>
> >>>>>>>>    Further, as mentioned in the Rationale section, Daffodil fills
> a gap
> >>>>>>>>    that exists in many ASF projects, including !NiFi, Spark,
> Storm,
> >>>>> Hadoop,
> >>>>>>>>    Tika, and others. In order for tools like these to consume new
> >>>>> types of
> >>>>>>>>    data, custom extensions are usually required. Rather than
> create
> >>>>> such
> >>>>>>>>    extensions, Daffodil provides an easy and standards-compliant
> way to
> >>>>>>>>    transform data to XML or JSON, which many of these tools
> already
> >>>>>>>>    natively support.
> >>>>>>>>
> >>>>>>>>    == Known Risks ==
> >>>>>>>>
> >>>>>>>>    === Orphaned Products ===
> >>>>>>>>
> >>>>>>>>    The current core developers are the leading contributors in the
> >>>>> space of
> >>>>>>>>    DFDL and wish to see it flourish. Though there is some risk
> that the
> >>>>>>>>    initial committers all come from the same company, a goal of
> >>>>> entering
> >>>>>>>>    into incubation is to grow the development community to
> minimize the
> >>>>>>>>    risk of reliance on a single company.
> >>>>>>>>
> >>>>>>>>    === Inexperience with Open Source ===
> >>>>>>>>
> >>>>>>>>    The Daffodil project began as an open source project and has
> >>>>> continued
> >>>>>>>>    that model throughout development. This includes public bug
> >>>>> tracking,
> >>>>>>>>    git revision control, automated builds and tests, and a public
> wiki
> >>>>> for
> >>>>>>>>    documentation.
> >>>>>>>>
> >>>>>>>>    Additionally, the current core developers and initial
> committers all
> >>>>>>>>    work for a company that relies on, believes in, promotes, and
> has
> >>>>> led or
> >>>>>>>>    contributed to many open source software projects, including
> SELinux
> >>>>>>>>    Userspace, OpenSCAP, CLIP, refpolicy, setools, RPM, and
> others. As
> >>>>> such,
> >>>>>>>>    there is low risk related to inexperience with open source
> software
> >>>>> and
> >>>>>>>>    processes.
> >>>>>>>>
> >>>>>>>>    === Homogeneous Developers ===
> >>>>>>>>
> >>>>>>>>    The proposed initial committers come from a single entity,
> though
> >>>>> we are
> >>>>>>>>    committed to growing the Daffodil development community to
> include a
> >>>>>>>>    broad group of additional committers from a wide array of
> >>>>> industries.
> >>>>>>>>
> >>>>>>>>    === Reliance on Salaried Developers ===
> >>>>>>>>
> >>>>>>>>    The proposed initial committers are paid by their employer to
> >>>>> contribute
> >>>>>>>>    to the Daffodil project. We expect that Daffodil development
> will
> >>>>>>>>    continue with salaried developers, and are committed to
> growing the
> >>>>>>>>    community to include non-salaried developers as well.
> >>>>>>>>
> >>>>>>>>    === Relationship with other Apache Projects ===
> >>>>>>>>
> >>>>>>>>    As mentioned in the Alignment section, Daffodil fills a clear
> gap in
> >>>>>>>>    numerous other ASF projects that consume and manage large
> amounts
> >>>>> of data.
> >>>>>>>>
> >>>>>>>>    As a specific example, Daffodil developers have created a
> Daffodil
> >>>>>>>>    Apache !NiFi Processor, currently in use in data transfer
> solutions,
> >>>>>>>>    which allows one to ingest non-native data into an Apache !NiFi
> >>>>> pipeline
> >>>>>>>>    as XML or JSON. This processor was well received by the Apache
> !NiFi
> >>>>>>>>    developers, with positive comments about the concise API and
> how it
> >>>>>>>>    could handle non-native data. Daffodil developers have also
> >>>>> successfully
> >>>>>>>>    prototyped integration with Apache Spark. We believe Daffodil
> could
> >>>>>>>>    provide a strong benefit to many other ASF projects that handle
> >>>>> fixed
> >>>>>>>>    format data. We anticipate working closely with such ASF
> projects to
> >>>>>>>>    include Daffodil where applicable to increase their ability to
> >>>>> support
> >>>>>>>>    new data formats with minimal effort.
> >>>>>>>>
> >>>>>>>>    Daffodil also depends on existing ASF projects, including
> Apache
> >>>>> Commons
> >>>>>>>>    and Apache Xerces.
> >>>>>>>>
> >>>>>>>>    === An Excessive Fascination with the Apache Brand ===
> >>>>>>>>
> >>>>>>>>    Although the Apache brand may certainly help to attract more
> >>>>>>>>    contributors, publicity is not the reason for this proposal. We
> >>>>> believe
> >>>>>>>>    Daffodil could provide a great benefit to the ASF and the
> numerous
> >>>>> data
> >>>>>>>>    focused projects that comprise it, as described in the
> Rationale and
> >>>>>>>>    Alignment sections. We hope to build a strong and vibrant
> community
> >>>>>>>>    built around The Apache Way, and not dependent on a single
> company.
> >>>>>>>>
> >>>>>>>>    === Documentation ===
> >>>>>>>>
> >>>>>>>>    Daffodil documentation can be found at:
> >>>>>>>>
> >>>>>>>>    *
> >>>>>>>>    https://opensource.ncsa.illinois.edu/confluence/
> >>>>>>>>    display/DFDL/Daffodil%3A+Open+Source+DFDL
> >>>>>>>>
> >>>>>>>>    Information about DFDL can be found at:
> >>>>>>>>
> >>>>>>>>    * https://www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
> >>>>>>>>    *
> >>>>>>>>    https://www.ibm.com/support/knowledgecenter/en/SSMKHH_9.0.
> >>>>>>>>    0/com.ibm.etools.mft.doc/df20060_.htm
> >>>>>>>>
> >>>>>>>>    Public examples of DFDL Schemas can be found at:
> >>>>>>>>
> >>>>>>>>    * https://github.com/DFDLSchemas
> >>>>>>>>
> >>>>>>>>    == Initial Source ==
> >>>>>>>>
> >>>>>>>>    The Daffodil git repo goes back to mid-2011 with approximately
> 20
> >>>>>>>>    different contributors and feedback from many users and
> developers.
> >>>>> The
> >>>>>>>>    core codebase is written in Scala and includes both a Scala
> and Java
> >>>>>>>>    API, along with Javadocs and Scaladocs for API usage. The
> initial
> >>>>> code
> >>>>>>>>    will come from the git repository currently hosted by NCSA at
> the
> >>>>>>>>    University of Illinois :
> >>>>>>>>
> >>>>>>>>    https://opensource.ncsa.illinois.edu/bitbucket/
> >>>>>>>>    projects/DFDL/repos/daffodil/
> >>>>>>>>
> >>>>>>>>    == Source and Intellectual Property Submission ==
> >>>>>>>>
> >>>>>>>>    The complete Daffodil code is licensed under the University of
> >>>>>>>>    Illinois/NCSA Open Source License. Much of the current
> codebase has
> >>>>> been
> >>>>>>>>    developed by Tresys Technology, who is open to relicensing the
> code
> >>>>> to
> >>>>>>>>    the Apache License version 2.0 and donate the source to the
> ASF.
> >>>>>>>>    Contacts at NCSA are also open to relicensing their
> contributions to
> >>>>>>>>    Apache v2. We plan to contact the other contributors and ask
> for
> >>>>>>>>    permission to relicense and donate their contributed code. For
> those
> >>>>>>>>    that decline or we cannot contact, their code will be removed
> or
> >>>>>>>>    replaced. We will work closely with Apache Legal to ensure all
> >>>>> issues
> >>>>>>>>    related to relicensing are acceptable.
> >>>>>>>>
> >>>>>>>>    == External Dependencies ==
> >>>>>>>>
> >>>>>>>>    We believe all current dependencies are compatible with the ASF
> >>>>>>>>    guidelines. Our dependency licenses come from the following
> license
> >>>>>>>>    styles: Apache v2, BSD, MIT, and ICU. The list of current
> Daffodil
> >>>>>>>>    dependencies and their licenses are documented here:
> >>>>>>>>
> >>>>>>>>    https://opensource.ncsa.illinois.edu/confluence/
> >>>>>>>>    display/DFDL/Dependencies+and+Licenses
> >>>>>>>>
> >>>>>>>>    == Cryptography ==
> >>>>>>>>
> >>>>>>>>    None
> >>>>>>>>
> >>>>>>>>    == Required Resources ==
> >>>>>>>>
> >>>>>>>>    === Mailing Lists ===
> >>>>>>>>
> >>>>>>>>    * commits@daffodil.incubator.apache.org
> >>>>>>>>    * dev@daffodil.incubator.apache.org
> >>>>>>>>    * private@daffodil.incubator.apache.org
> >>>>>>>>    * user@daffodil.incubator.apache.org
> >>>>>>>>
> >>>>>>>>    === Source Control ===
> >>>>>>>>
> >>>>>>>>    git://git.apache.org/incubator-daffodil.git
> >>>>>>>>
> >>>>>>>>    === Issue Tracking ===
> >>>>>>>>
> >>>>>>>>    JIRA Daffodil (DFDL)
> >>>>>>>>
> >>>>>>>>    === Initial Committers ===
> >>>>>>>>
> >>>>>>>>    * Beth Finnegan <efinnegan at tresys dot com>
> >>>>>>>>    * Dave Thompson <dthompson at tresys dot com>
> >>>>>>>>    * Josh Adams <jadams at tresys dot com>
> >>>>>>>>    * Mike Beckerle <mbeckerle at tresys dot com>
> >>>>>>>>    * Steve Lawrence <slawrence at tresys dot com>
> >>>>>>>>    * Taylor Wise <twise at tresys dot com>
> >>>>>>>>
> >>>>>>>>    === Affiliations ===
> >>>>>>>>
> >>>>>>>>    * Beth Finnegan (Tresys Technology)
> >>>>>>>>    * Dave Thompson (Tresys Technology)
> >>>>>>>>    * Josh Adams (Tresys Technology)
> >>>>>>>>    * Mike Beckerle (Tresys Technology)
> >>>>>>>>    * Steve Lawrence (Tresys Technology)
> >>>>>>>>    * Taylor Wise (Tresys Technology)
> >>>>>>>>
> >>>>>>>>    == Sponsors ==
> >>>>>>>>
> >>>>>>>>    === Champion ===
> >>>>>>>>
> >>>>>>>>    * TBD
> >>>>>>>>
> >>>>>>>>    === Nominated Mentors ===
> >>>>>>>>
> >>>>>>>>    * TBD
> >>>>>>>>
> >>>>>>>>    === Sponsoring Entity ===
> >>>>>>>>
> >>>>>>>>    We request the Apache Incubator to sponsor this project.
> >>>>>>>>
> >>>>>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>>>>>    To unsubscribe, e-mail:
> general-unsubscribe@incubator.apache.org
> >>>>>>>>    For additional commands, e-mail:
> general-help@incubator.apache.org
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>>>>>    To unsubscribe, e-mail:
> general-unsubscribe@incubator.apache.org
> >>>>> <ma...@incubator.apache.org>
> >>>>>>>>    For additional commands, e-mail:
> general-help@incubator.apache.org
> >>>>> <ma...@incubator.apache.org>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> ---------------------------------------------------------------------
> >>>>>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >>>>>>>> For additional commands, e-mail:
> general-help@incubator.apache.org
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> ---------------------------------------------------------------------
> >>>>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >>>>>>> For additional commands, e-mail: general-help@incubator.apache.org
> >>>>>>
> >>>>>
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >>>>> For additional commands, e-mail: general-help@incubator.apache.org
> >>>>>
> >>>>
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >>> For additional commands, e-mail: general-help@incubator.apache.org
> >>>
> >
>
>

Re: [DISCUSS] Daffodil Incubation Proposal

Posted by Steve Lawrence <st...@gmail.com>.
Sounds good to me. Can I start a vote, or is something a champion/mentor
would normally start? The project also does not have a champion--is that
necessary/would either of you be interested in being the champion?

Thanks,
- Steve

On 08/08/2017 10:59 PM, Dave Fisher wrote:
> Hi -
> 
> I agree. I'm willing to proceed with John and I as Mentors.
> 
> Regards,
> Dave
> 
> Sent from my iPhone
> 
>> On Aug 8, 2017, at 7:10 PM, John D. Ament <jo...@apache.org> wrote:
>>
>> Steve,
>>
>> At this point, I'd recommend we wrap the discussion and call for a vote.  While ideally we want 3 mentors, we can get started with 2 and see how things progress.
>>
>> John
>>
>>> On Wed, Aug 2, 2017 at 3:55 PM Steve Lawrence <st...@gmail.com> wrote:
>>> Thanks John!
>>>
>>> On 08/02/2017 03:23 PM, John D. Ament wrote:
>>>> You can also count me in as a mentor.
>>>>
>>>> John
>>>>
>>>> On Wed, Aug 2, 2017 at 3:14 PM Steve Lawrence <st...@gmail.com>
>>>> wrote:
>>>>
>>>>> Understood. Thanks for the interest!
>>>>>
>>>>> - Steve
>>>>>
>>>>> On 08/02/2017 02:57 PM, Dave Fisher wrote:
>>>>>> Hi Steve,
>>>>>>
>>>>>> It was not so much the lack of committers as it was the current
>>>>> diversity. That is not a blocker for entry to Incubation.
>>>>>>
>>>>>> I am willing to be one of the Mentors. Once there are at least two more
>>>>> we can push forward.
>>>>>>
>>>>>> Regards,
>>>>>> Dave
>>>>>>
>>>>>>> On Aug 1, 2017, at 5:09 AM, Steve Lawrence <
>>>>> stephen.d.lawrence@gmail.com> wrote:
>>>>>>>
>>>>>>> Discussions have died down, and I think the consensus from the responses
>>>>>>> is that the issues are 1) the lack of committers and 2) the lack of a
>>>>>>> champion and mentors. We hope to address #1 and grow the community as
>>>>>>> part of incubation. Is anyone interested in being a champion or mentor
>>>>>>> and help us with #2?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> - Steve
>>>>>>>
>>>>>>> On 07/26/2017 04:06 PM, Chris Mattmann wrote:
>>>>>>>> This sounds like a very interesting project.
>>>>>>>>
>>>>>>>> I don’t have the time to mentor at the moment but I will keep a close
>>>>> eye on it.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Chris Mattmann
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 7/25/17, 11:53 AM, "McHenry, Kenton Guadron" <mc...@illinois.edu>
>>>>> wrote:
>>>>>>>>
>>>>>>>>    Hi Dave,
>>>>>>>>
>>>>>>>>    The developers that were at NCSA have moved on to other
>>>>> organizations.  While we still leverage Daffodil and are very much
>>>>> interested in seeing it move forward, development is currently done by the
>>>>> Tresys team.  Agreed on the synergy with Tika.
>>>>>>>>
>>>>>>>>    Kenton McHenry, Ph.D.
>>>>>>>>    Principal Research Scientist, Adjunct Assistant Professor of
>>>>> Computer Science
>>>>>>>>    Deputy Director of the Scientific Software & Applications Division
>>>>>>>>    National Center for Supercomputing Applications, University of
>>>>> Illinois at Urbana-Champaign
>>>>>>>>
>>>>>>>>    On Jul 24, 2017, at 1:55 PM, Dave Fisher <dave2wave@comcast.net
>>>>> <ma...@comcast.net>> wrote:
>>>>>>>>
>>>>>>>>    Hi Kenton,
>>>>>>>>
>>>>>>>>    Is there any reason that you and others from the NCSA are not
>>>>> Initial Committers? That would make this proposal stronger.
>>>>>>>>
>>>>>>>>    Regarding Apache Tika - it relies on other projects including
>>>>> Apache POI and Apache PDFBox. They are pragmatic about what is used. If
>>>>> Daffodil works to expand then I think that there would be good synergy
>>>>> between the projects. I know as a POI PMC member that the POI community has
>>>>> significantly benefited from the Tika community some of whom are from Mitre.
>>>>>>>>
>>>>>>>>    To date Tika has not emphasized structured data, although they do
>>>>> extract content from Excel and OpenOffice.
>>>>>>>>
>>>>>>>>    I am intrigued.
>>>>>>>>
>>>>>>>>    Regards,
>>>>>>>>    Dave
>>>>>>>>
>>>>>>>>    On Jul 24, 2017, at 10:55 AM, McHenry, Kenton Guadron <
>>>>> mchenry@illinois.edu<ma...@illinois.edu>> wrote:
>>>>>>>>
>>>>>>>>    Yes, DFDL and its open source implementation Daffodil are more
>>>>> about file formats and getting access to the entirety of a file's contents
>>>>> in a consistent way through machine readable specifications.  The work has
>>>>> implications in the area of digital preservation allowing one to preserve
>>>>> these machine readable specifications rather than all the tools needed to
>>>>> open/save a file in order to work with it.  Imagine someone developing
>>>>> graphics software to work with 3D models and not having to worry about the
>>>>> hundreds of formats out there for 3D meshes (whether there are tools for
>>>>> opening the files and whether they can get access to those tools, whether
>>>>> the spec is available and worrying about how complex that spec is to
>>>>> implement, etc.), and simply building their code around the contents (e.g.
>>>>> vertices, faces, etc.).  One could come up with similar scenarios for other
>>>>> data types (documents, images, videos, audio, depth data, numeric data).
>>>>> Ideally tools built supporting DFDL, could someday, support any format for
>>>>> that type without the developer having to worry about the details of how
>>>>> that data is represented within a file.
>>>>>>>>
>>>>>>>>    Kenton McHenry, Ph.D.
>>>>>>>>    Principal Research Scientist, Adjunct Assistant Professor of
>>>>> Computer Science
>>>>>>>>    Deputy Director of the Scientific Software & Applications Division
>>>>>>>>    National Center for Supercomputing Applications, University of
>>>>> Illinois at Urbana-Champaign
>>>>>>>>
>>>>>>>>    On Jul 24, 2017, at 10:30 AM, Steve Lawrence <
>>>>> stephen.d.lawrence@gmail.com<ma...@gmail.com><mailto:
>>>>> stephen.d.lawrence@gmail.com>> wrote:
>>>>>>>>
>>>>>>>>    I'll preface this saying that I don't have a ton of experience with
>>>>>>>>    Apache Tika. But based on my understanding, Tika and Daffodil do
>>>>> have
>>>>>>>>    somewhat similar goals, but reach them in different ways. For
>>>>> example,
>>>>>>>>    Tika requires that one writes /code/ to perform data extraction,
>>>>> usually
>>>>>>>>    relying on existing Java libraries to extract the desired metadata.
>>>>> The
>>>>>>>>    downside to this is that code can be buggy, and libraries might not
>>>>> even
>>>>>>>>    exist for formats of interest (especially common with legacy and
>>>>>>>>    military data).
>>>>>>>>
>>>>>>>>    Daffodil, on the other hand, does not require one to write any code.
>>>>>>>>    Instead, one writes a DFDL Schema (similar to XML Schema, with DFDL
>>>>>>>>    annotations) that fully describes the data, which Daffodil then
>>>>> uses to
>>>>>>>>    convert the data to XML/JSON for extraction. So adding support for
>>>>> a new
>>>>>>>>    format means writing a new schema rather than new code. And less
>>>>> code
>>>>>>>>    generally means less bugs. Also, for secure systems that require
>>>>>>>>    certification, generally speaking, it is easier to certify a schema
>>>>> as
>>>>>>>>    compared to code.
>>>>>>>>
>>>>>>>>    We certainly don't believe that Daffodil could replace Tika, but it
>>>>> does
>>>>>>>>    have the potential to add new functionality to Tika for formats
>>>>> that do
>>>>>>>>    not have existing libraries. One of our goals is to look into
>>>>>>>>    integrating Daffodil support into tools like Tika. We'd love to hear
>>>>>>>>    from Tika devs if this is something they'd be interested in.
>>>>>>>>
>>>>>>>>    I'll also add that whereas Tika tends to focus primarily on
>>>>> metadata,
>>>>>>>>    DFDL schemas usually describe an entire file format down to the
>>>>> byte, so
>>>>>>>>    one can extract more than just meta data, including text and binary
>>>>>>>>    data. Further differentiating, Daffodil has support for serializing
>>>>> data
>>>>>>>>    (called unparse) from the XML/JSON representation, allowing one to
>>>>>>>>    transform or filter data as well. We don't believe this feature is
>>>>> all
>>>>>>>>    that applicable to Tika, but may be useful to other technologies
>>>>> such as
>>>>>>>>    filtering or data fuzzing technologies.
>>>>>>>>
>>>>>>>>    - Steve
>>>>>>>>
>>>>>>>>
>>>>>>>>    On 07/24/2017 10:59 AM, Mike Drob wrote:
>>>>>>>>    What is the relationship between Daffodil and something like Apache
>>>>> Tika's
>>>>>>>>    extraction engine?
>>>>>>>>
>>>>>>>>    On Mon, Jul 24, 2017 at 9:53 AM, Steve Lawrence <
>>>>>>>>    stephen.d.lawrence@gmail.com<mailto:stephen.d.lawrence@gmail.com
>>>>>> <ma...@gmail.com>> wrote:
>>>>>>>>
>>>>>>>>    Dear Apache Incubator Community,
>>>>>>>>
>>>>>>>>    We would like to start a discussion around a proposal to bring
>>>>> Daffodil
>>>>>>>>    into the Apache Incubator. Daffodil is a implementation of the DFDL
>>>>>>>>    specification used to convert between fixed format data and
>>>>> XML/JSON.
>>>>>>>>
>>>>>>>>    The draft proposal can be found in the wiki at the following URL:
>>>>>>>>
>>>>>>>>    https://wiki.apache.org/incubator/DaffodilProposal
>>>>>>>>
>>>>>>>>    We do not yet have a champion or mentors, but it was recommended
>>>>> that we
>>>>>>>>    create a proposal and send it to this list to potentially find those
>>>>>>>>    that might be interested. The text for the draft proposal is found
>>>>>>>>    below. We look forward to your input.
>>>>>>>>
>>>>>>>>    Thanks,
>>>>>>>>    -Steve
>>>>>>>>
>>>>>>>>
>>>>>>>>    = Daffodil Proposal =
>>>>>>>>
>>>>>>>>    == Abstract ==
>>>>>>>>
>>>>>>>>    Daffodil is an implementation of the Data Format Description
>>>>> Language
>>>>>>>>    (DFDL) used to convert between fixed format data and XML/JSON.
>>>>>>>>
>>>>>>>>    == Proposal ==
>>>>>>>>
>>>>>>>>    The Data Format Description Language (DFDL) is a specification,
>>>>>>>>    developed by the Open Grid Forum, capable of describing many data
>>>>>>>>    formats, including both textual and binary, scientific and numeric,
>>>>>>>>    legacy and modern, commercial record-oriented, and many industry and
>>>>>>>>    military standards. It defines a language that is a subset of W3C
>>>>> XML
>>>>>>>>    schema to describe the logical format of the data, and annotations
>>>>>>>>    within the schema to describe the physical representation.
>>>>>>>>
>>>>>>>>    Daffodil is an open source implementation of the DFDL specification
>>>>> that
>>>>>>>>    uses these DFDL schemas to parse fixed format data into an infoset,
>>>>>>>>    which is most commonly represented as either XML or JSON. This
>>>>> allows
>>>>>>>>    the use of well-established XML or JSON technologies and libraries
>>>>> to
>>>>>>>>    consume, inspect, and manipulate fixed format data in existing
>>>>>>>>    solutions. Daffodil is also capable of the reverse by serializing or
>>>>>>>>    "unparsing" an XML or JSON infoset back to the original data format.
>>>>>>>>
>>>>>>>>    == Background ==
>>>>>>>>
>>>>>>>>    Many different software solutions need to consume and manage data,
>>>>>>>>    including data directed routing, databases, data analysis, data
>>>>>>>>    cleansing, data visualizing, and more. A key aspect of such
>>>>> solutions is
>>>>>>>>    the need to transform the data into an easily consumable format.
>>>>>>>>    Usually, this means that for each unique data format, one develops a
>>>>>>>>    tool that can read and extract the necessary information, often
>>>>> leading
>>>>>>>>    to ad-hoc and data-format-specific description systems. Such
>>>>> systems are
>>>>>>>>    often proprietary, not well tested, and incompatible, leading to
>>>>> vendor
>>>>>>>>    lock-in, flawed software, and increased training costs. DFDL is a
>>>>> new
>>>>>>>>    standard, with version 1.0 completed in October of 2016, that solves
>>>>>>>>    these problems by defining an open standard to describe many
>>>>> different
>>>>>>>>    data formats and how to parse and unparse between the data and
>>>>> XML/JSON.
>>>>>>>>
>>>>>>>>    Two closed source implementations of DFDL currently exist. The
>>>>> first was
>>>>>>>>    created by IBM and is now part of their IBM® Integration Bus
>>>>> product.
>>>>>>>>    The second was created by the European Space Agency, called DFDL4S
>>>>> or
>>>>>>>>    "DFDL for Space" targeted at the challenges of their satellite data
>>>>>>>>    processing.
>>>>>>>>
>>>>>>>>    Around 2005, Pacific Northwest National Lab created Defuddle, built
>>>>> as
>>>>>>>>    an open source implementation and proof of concept of the draft DFDL
>>>>>>>>    specification and a test bed to feed new concepts into specification
>>>>>>>>    development. Primary development of Defuddle was eventually taken
>>>>> over
>>>>>>>>    by the National Center for Supercomputing Applications (NCSA).
>>>>> However,
>>>>>>>>    due to evolution of the DFDL specification and architectural and
>>>>>>>>    performance issues with Defuddle, around 2009, NCSA restarted the
>>>>>>>>    project with the new name of Daffodil, with a goal of implementing
>>>>> the
>>>>>>>>    complete DFDL specification. Daffodil development continued at NCSA
>>>>>>>>    until around 2012, at which point development slowed due to budget
>>>>>>>>    limitations. Shortly thereafter, primary development was picked up
>>>>> by
>>>>>>>>    Tresys Technology where it continues today, with contributions from
>>>>>>>>    other entities such as the Navy Research Lab, the Air Force Research
>>>>>>>>    Lab, MITRE, and Booz Allen Hamilton. In February of 2015, Daffodil
>>>>>>>>    version 1.0.0 was released, including support for the DFDL features
>>>>>>>>    needed to parse many common file formats. Daffodil version 2.0.0 is
>>>>>>>>    expected to be released in August of 2017, which will include
>>>>> unparse
>>>>>>>>    support with one-to-one parsing feature parity.
>>>>>>>>
>>>>>>>>    Entities including IBM, MITRE, NATO NCI Agency, Northrop-Grumman,
>>>>> Quark
>>>>>>>>    Security, Raytheon, and Tresys Technology have developed DFDL
>>>>> schemas
>>>>>>>>    for many data formats from varying technology domains, including
>>>>> PNG,
>>>>>>>>    GIF, BMP, PCAP, HL7, EDIFACT, NACHA, vCard, iCalendar, and
>>>>> MIL-STD-2045,
>>>>>>>>    many of which are publicly available on the DFDL Schemas github.
>>>>> There
>>>>>>>>    are also a number of military-application data formats, the
>>>>>>>>    specifications of which are not public, which have historically been
>>>>>>>>    very difficult and expensive to process, and for which DFDL schemas
>>>>> have
>>>>>>>>    been created or are actively in development; these include
>>>>>>>>    MIL-STD-6040/USMTF ATO, MIL-STD-6017/VMF, MIL-STD-6016/NATO STANAG
>>>>> 5516
>>>>>>>>    (aka "Link16").
>>>>>>>>
>>>>>>>>    == Rationale ==
>>>>>>>>
>>>>>>>>    Numerous software solutions exist that consume, inspect, analyze,
>>>>> and
>>>>>>>>    transform data, many of which can be found in the Apache Software
>>>>>>>>    Foundation (ASF). In order for tools like these to consume new
>>>>> types of
>>>>>>>>    data, custom extensions are usually required, often with high
>>>>>>>>    development and testing costs. Daffodil fills a clear gap in many of
>>>>>>>>    these solutions, providing a simple and low cost way to transform
>>>>> data
>>>>>>>>    to XML or JSON, which many of these tools natively support already.
>>>>> With
>>>>>>>>    the upcoming 2.0.0 release, the Daffodil project will have achieved
>>>>> a
>>>>>>>>    level of functionality in both parse and unparse that, when
>>>>> integrated
>>>>>>>>    into existing solutions, could provide for a new method to quickly
>>>>>>>>    enable support for new data formats.
>>>>>>>>
>>>>>>>>    == Initial Goals ==
>>>>>>>>
>>>>>>>>    * Relicense the existing code from the University of Illinois/NCSA
>>>>> Open
>>>>>>>>    Source License to the Apache License version 2.0, working with
>>>>> Apache
>>>>>>>>    Legal to ensure correctness, and with Daffodil contributors to get
>>>>>>>>    their permission.
>>>>>>>>    * Move the existing codebase, documentation, bugs, and mailing
>>>>> lists to
>>>>>>>>    the Apache hosted infrastructure
>>>>>>>>    * Establish a formal release process and schedule, allowing for
>>>>>>>>    dependable release cycles in a manner consistent with the Apache
>>>>>>>>    development process.
>>>>>>>>    * Build relationships with ASF projects to add Daffodil support
>>>>> where
>>>>>>>>    appropriate
>>>>>>>>    * Grow the community to establish a diversity of background and
>>>>> expertise.
>>>>>>>>
>>>>>>>>    == Current Status ==
>>>>>>>>
>>>>>>>>    === Meritocracy ===
>>>>>>>>
>>>>>>>>    All initial committers are familiar with the principles of
>>>>> meritocracy.
>>>>>>>>    The Daffodil project has followed the model of meritocracy in the
>>>>> past,
>>>>>>>>    providing multiple outside entities commit access based on the
>>>>> quality
>>>>>>>>    of their contributions. In order to grow the Daffodil user base and
>>>>>>>>    development community, we are dedicated to continuing to operate
>>>>>>>>    Daffodil as a meritocracy.
>>>>>>>>
>>>>>>>>    A key ingredient in a meritocracy of developers is open group code
>>>>>>>>    review. The Daffodil project has operated in this mode throughout
>>>>> its
>>>>>>>>    existence and this provides a forum to improve the code, verify code
>>>>>>>>    quality, and educate new developers on the code base.
>>>>>>>>
>>>>>>>>    === Community ===
>>>>>>>>
>>>>>>>>    Daffodil has a small community of users and developers. Although
>>>>> primary
>>>>>>>>    Daffodil development is done by Tresys Technology, a handful of
>>>>> other
>>>>>>>>    contributions have come from other entities including the Navy
>>>>> Research
>>>>>>>>    Lab, the Air Force Research Lab, MITRE, and Booz Allen Hamilton. In
>>>>>>>>    addition to developers, multiple users of Daffodil have created DFDL
>>>>>>>>    schemas, including entities such as MITRE, IBM, Raytheon, Quark
>>>>>>>>    Security, and Tresys Technology. The DFDL Schemas github community
>>>>> has
>>>>>>>>    been created as a place for DFDL schemas to be published. The
>>>>> Daffodil
>>>>>>>>    project also makes use of mailing lists, !HipChat, and Confluence
>>>>>>>>    Questions to build a community of users and system for support.
>>>>>>>>
>>>>>>>>    === Core Developers ===
>>>>>>>>
>>>>>>>>    The core developers of Daffodil are employed by Tresys Technology.
>>>>> We
>>>>>>>>    will work to grow the community among a more diverse set of
>>>>> developers
>>>>>>>>    and industries.
>>>>>>>>
>>>>>>>>    === Alignment ===
>>>>>>>>
>>>>>>>>    Daffodil was created as an open source project with a philosophy
>>>>>>>>    consistent with The Apache Way. A strong belief in meritocracy,
>>>>>>>>    community involvement in decisions, openness, and ensuring a high
>>>>> level
>>>>>>>>    of quality in code, documentation, and testing are some of our
>>>>> shared
>>>>>>>>    core beliefs.
>>>>>>>>
>>>>>>>>    Further, as mentioned in the Rationale section, Daffodil fills a gap
>>>>>>>>    that exists in many ASF projects, including !NiFi, Spark, Storm,
>>>>> Hadoop,
>>>>>>>>    Tika, and others. In order for tools like these to consume new
>>>>> types of
>>>>>>>>    data, custom extensions are usually required. Rather than create
>>>>> such
>>>>>>>>    extensions, Daffodil provides an easy and standards-compliant way to
>>>>>>>>    transform data to XML or JSON, which many of these tools already
>>>>>>>>    natively support.
>>>>>>>>
>>>>>>>>    == Known Risks ==
>>>>>>>>
>>>>>>>>    === Orphaned Products ===
>>>>>>>>
>>>>>>>>    The current core developers are the leading contributors in the
>>>>> space of
>>>>>>>>    DFDL and wish to see it flourish. Though there is some risk that the
>>>>>>>>    initial committers all come from the same company, a goal of
>>>>> entering
>>>>>>>>    into incubation is to grow the development community to minimize the
>>>>>>>>    risk of reliance on a single company.
>>>>>>>>
>>>>>>>>    === Inexperience with Open Source ===
>>>>>>>>
>>>>>>>>    The Daffodil project began as an open source project and has
>>>>> continued
>>>>>>>>    that model throughout development. This includes public bug
>>>>> tracking,
>>>>>>>>    git revision control, automated builds and tests, and a public wiki
>>>>> for
>>>>>>>>    documentation.
>>>>>>>>
>>>>>>>>    Additionally, the current core developers and initial committers all
>>>>>>>>    work for a company that relies on, believes in, promotes, and has
>>>>> led or
>>>>>>>>    contributed to many open source software projects, including SELinux
>>>>>>>>    Userspace, OpenSCAP, CLIP, refpolicy, setools, RPM, and others. As
>>>>> such,
>>>>>>>>    there is low risk related to inexperience with open source software
>>>>> and
>>>>>>>>    processes.
>>>>>>>>
>>>>>>>>    === Homogeneous Developers ===
>>>>>>>>
>>>>>>>>    The proposed initial committers come from a single entity, though
>>>>> we are
>>>>>>>>    committed to growing the Daffodil development community to include a
>>>>>>>>    broad group of additional committers from a wide array of
>>>>> industries.
>>>>>>>>
>>>>>>>>    === Reliance on Salaried Developers ===
>>>>>>>>
>>>>>>>>    The proposed initial committers are paid by their employer to
>>>>> contribute
>>>>>>>>    to the Daffodil project. We expect that Daffodil development will
>>>>>>>>    continue with salaried developers, and are committed to growing the
>>>>>>>>    community to include non-salaried developers as well.
>>>>>>>>
>>>>>>>>    === Relationship with other Apache Projects ===
>>>>>>>>
>>>>>>>>    As mentioned in the Alignment section, Daffodil fills a clear gap in
>>>>>>>>    numerous other ASF projects that consume and manage large amounts
>>>>> of data.
>>>>>>>>
>>>>>>>>    As a specific example, Daffodil developers have created a Daffodil
>>>>>>>>    Apache !NiFi Processor, currently in use in data transfer solutions,
>>>>>>>>    which allows one to ingest non-native data into an Apache !NiFi
>>>>> pipeline
>>>>>>>>    as XML or JSON. This processor was well received by the Apache !NiFi
>>>>>>>>    developers, with positive comments about the concise API and how it
>>>>>>>>    could handle non-native data. Daffodil developers have also
>>>>> successfully
>>>>>>>>    prototyped integration with Apache Spark. We believe Daffodil could
>>>>>>>>    provide a strong benefit to many other ASF projects that handle
>>>>> fixed
>>>>>>>>    format data. We anticipate working closely with such ASF projects to
>>>>>>>>    include Daffodil where applicable to increase their ability to
>>>>> support
>>>>>>>>    new data formats with minimal effort.
>>>>>>>>
>>>>>>>>    Daffodil also depends on existing ASF projects, including Apache
>>>>> Commons
>>>>>>>>    and Apache Xerces.
>>>>>>>>
>>>>>>>>    === An Excessive Fascination with the Apache Brand ===
>>>>>>>>
>>>>>>>>    Although the Apache brand may certainly help to attract more
>>>>>>>>    contributors, publicity is not the reason for this proposal. We
>>>>> believe
>>>>>>>>    Daffodil could provide a great benefit to the ASF and the numerous
>>>>> data
>>>>>>>>    focused projects that comprise it, as described in the Rationale and
>>>>>>>>    Alignment sections. We hope to build a strong and vibrant community
>>>>>>>>    built around The Apache Way, and not dependent on a single company.
>>>>>>>>
>>>>>>>>    === Documentation ===
>>>>>>>>
>>>>>>>>    Daffodil documentation can be found at:
>>>>>>>>
>>>>>>>>    *
>>>>>>>>    https://opensource.ncsa.illinois.edu/confluence/
>>>>>>>>    display/DFDL/Daffodil%3A+Open+Source+DFDL
>>>>>>>>
>>>>>>>>    Information about DFDL can be found at:
>>>>>>>>
>>>>>>>>    * https://www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
>>>>>>>>    *
>>>>>>>>    https://www.ibm.com/support/knowledgecenter/en/SSMKHH_9.0.
>>>>>>>>    0/com.ibm.etools.mft.doc/df20060_.htm
>>>>>>>>
>>>>>>>>    Public examples of DFDL Schemas can be found at:
>>>>>>>>
>>>>>>>>    * https://github.com/DFDLSchemas
>>>>>>>>
>>>>>>>>    == Initial Source ==
>>>>>>>>
>>>>>>>>    The Daffodil git repo goes back to mid-2011 with approximately 20
>>>>>>>>    different contributors and feedback from many users and developers.
>>>>> The
>>>>>>>>    core codebase is written in Scala and includes both a Scala and Java
>>>>>>>>    API, along with Javadocs and Scaladocs for API usage. The initial
>>>>> code
>>>>>>>>    will come from the git repository currently hosted by NCSA at the
>>>>>>>>    University of Illinois :
>>>>>>>>
>>>>>>>>    https://opensource.ncsa.illinois.edu/bitbucket/
>>>>>>>>    projects/DFDL/repos/daffodil/
>>>>>>>>
>>>>>>>>    == Source and Intellectual Property Submission ==
>>>>>>>>
>>>>>>>>    The complete Daffodil code is licensed under the University of
>>>>>>>>    Illinois/NCSA Open Source License. Much of the current codebase has
>>>>> been
>>>>>>>>    developed by Tresys Technology, who is open to relicensing the code
>>>>> to
>>>>>>>>    the Apache License version 2.0 and donate the source to the ASF.
>>>>>>>>    Contacts at NCSA are also open to relicensing their contributions to
>>>>>>>>    Apache v2. We plan to contact the other contributors and ask for
>>>>>>>>    permission to relicense and donate their contributed code. For those
>>>>>>>>    that decline or we cannot contact, their code will be removed or
>>>>>>>>    replaced. We will work closely with Apache Legal to ensure all
>>>>> issues
>>>>>>>>    related to relicensing are acceptable.
>>>>>>>>
>>>>>>>>    == External Dependencies ==
>>>>>>>>
>>>>>>>>    We believe all current dependencies are compatible with the ASF
>>>>>>>>    guidelines. Our dependency licenses come from the following license
>>>>>>>>    styles: Apache v2, BSD, MIT, and ICU. The list of current Daffodil
>>>>>>>>    dependencies and their licenses are documented here:
>>>>>>>>
>>>>>>>>    https://opensource.ncsa.illinois.edu/confluence/
>>>>>>>>    display/DFDL/Dependencies+and+Licenses
>>>>>>>>
>>>>>>>>    == Cryptography ==
>>>>>>>>
>>>>>>>>    None
>>>>>>>>
>>>>>>>>    == Required Resources ==
>>>>>>>>
>>>>>>>>    === Mailing Lists ===
>>>>>>>>
>>>>>>>>    * commits@daffodil.incubator.apache.org
>>>>>>>>    * dev@daffodil.incubator.apache.org
>>>>>>>>    * private@daffodil.incubator.apache.org
>>>>>>>>    * user@daffodil.incubator.apache.org
>>>>>>>>
>>>>>>>>    === Source Control ===
>>>>>>>>
>>>>>>>>    git://git.apache.org/incubator-daffodil.git
>>>>>>>>
>>>>>>>>    === Issue Tracking ===
>>>>>>>>
>>>>>>>>    JIRA Daffodil (DFDL)
>>>>>>>>
>>>>>>>>    === Initial Committers ===
>>>>>>>>
>>>>>>>>    * Beth Finnegan <efinnegan at tresys dot com>
>>>>>>>>    * Dave Thompson <dthompson at tresys dot com>
>>>>>>>>    * Josh Adams <jadams at tresys dot com>
>>>>>>>>    * Mike Beckerle <mbeckerle at tresys dot com>
>>>>>>>>    * Steve Lawrence <slawrence at tresys dot com>
>>>>>>>>    * Taylor Wise <twise at tresys dot com>
>>>>>>>>
>>>>>>>>    === Affiliations ===
>>>>>>>>
>>>>>>>>    * Beth Finnegan (Tresys Technology)
>>>>>>>>    * Dave Thompson (Tresys Technology)
>>>>>>>>    * Josh Adams (Tresys Technology)
>>>>>>>>    * Mike Beckerle (Tresys Technology)
>>>>>>>>    * Steve Lawrence (Tresys Technology)
>>>>>>>>    * Taylor Wise (Tresys Technology)
>>>>>>>>
>>>>>>>>    == Sponsors ==
>>>>>>>>
>>>>>>>>    === Champion ===
>>>>>>>>
>>>>>>>>    * TBD
>>>>>>>>
>>>>>>>>    === Nominated Mentors ===
>>>>>>>>
>>>>>>>>    * TBD
>>>>>>>>
>>>>>>>>    === Sponsoring Entity ===
>>>>>>>>
>>>>>>>>    We request the Apache Incubator to sponsor this project.
>>>>>>>>
>>>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>>>>>    To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>>>>>>>    For additional commands, e-mail: general-help@incubator.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>>>>>    To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>>>> <ma...@incubator.apache.org>
>>>>>>>>    For additional commands, e-mail: general-help@incubator.apache.org
>>>>> <ma...@incubator.apache.org>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>>>>>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>>>>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>>>
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] Daffodil Incubation Proposal

Posted by Dave Fisher <da...@comcast.net>.
Hi -

I agree. I'm willing to proceed with John and I as Mentors.

Regards,
Dave

Sent from my iPhone

> On Aug 8, 2017, at 7:10 PM, John D. Ament <jo...@apache.org> wrote:
> 
> Steve,
> 
> At this point, I'd recommend we wrap the discussion and call for a vote.  While ideally we want 3 mentors, we can get started with 2 and see how things progress.
> 
> John
> 
>> On Wed, Aug 2, 2017 at 3:55 PM Steve Lawrence <st...@gmail.com> wrote:
>> Thanks John!
>> 
>> On 08/02/2017 03:23 PM, John D. Ament wrote:
>> > You can also count me in as a mentor.
>> >
>> > John
>> >
>> > On Wed, Aug 2, 2017 at 3:14 PM Steve Lawrence <st...@gmail.com>
>> > wrote:
>> >
>> >> Understood. Thanks for the interest!
>> >>
>> >> - Steve
>> >>
>> >> On 08/02/2017 02:57 PM, Dave Fisher wrote:
>> >>> Hi Steve,
>> >>>
>> >>> It was not so much the lack of committers as it was the current
>> >> diversity. That is not a blocker for entry to Incubation.
>> >>>
>> >>> I am willing to be one of the Mentors. Once there are at least two more
>> >> we can push forward.
>> >>>
>> >>> Regards,
>> >>> Dave
>> >>>
>> >>>> On Aug 1, 2017, at 5:09 AM, Steve Lawrence <
>> >> stephen.d.lawrence@gmail.com> wrote:
>> >>>>
>> >>>> Discussions have died down, and I think the consensus from the responses
>> >>>> is that the issues are 1) the lack of committers and 2) the lack of a
>> >>>> champion and mentors. We hope to address #1 and grow the community as
>> >>>> part of incubation. Is anyone interested in being a champion or mentor
>> >>>> and help us with #2?
>> >>>>
>> >>>> Thanks,
>> >>>> - Steve
>> >>>>
>> >>>> On 07/26/2017 04:06 PM, Chris Mattmann wrote:
>> >>>>> This sounds like a very interesting project.
>> >>>>>
>> >>>>> I don’t have the time to mentor at the moment but I will keep a close
>> >> eye on it.
>> >>>>>
>> >>>>> Cheers,
>> >>>>> Chris Mattmann
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On 7/25/17, 11:53 AM, "McHenry, Kenton Guadron" <mc...@illinois.edu>
>> >> wrote:
>> >>>>>
>> >>>>>    Hi Dave,
>> >>>>>
>> >>>>>    The developers that were at NCSA have moved on to other
>> >> organizations.  While we still leverage Daffodil and are very much
>> >> interested in seeing it move forward, development is currently done by the
>> >> Tresys team.  Agreed on the synergy with Tika.
>> >>>>>
>> >>>>>    Kenton McHenry, Ph.D.
>> >>>>>    Principal Research Scientist, Adjunct Assistant Professor of
>> >> Computer Science
>> >>>>>    Deputy Director of the Scientific Software & Applications Division
>> >>>>>    National Center for Supercomputing Applications, University of
>> >> Illinois at Urbana-Champaign
>> >>>>>
>> >>>>>    On Jul 24, 2017, at 1:55 PM, Dave Fisher <dave2wave@comcast.net
>> >> <ma...@comcast.net>> wrote:
>> >>>>>
>> >>>>>    Hi Kenton,
>> >>>>>
>> >>>>>    Is there any reason that you and others from the NCSA are not
>> >> Initial Committers? That would make this proposal stronger.
>> >>>>>
>> >>>>>    Regarding Apache Tika - it relies on other projects including
>> >> Apache POI and Apache PDFBox. They are pragmatic about what is used. If
>> >> Daffodil works to expand then I think that there would be good synergy
>> >> between the projects. I know as a POI PMC member that the POI community has
>> >> significantly benefited from the Tika community some of whom are from Mitre.
>> >>>>>
>> >>>>>    To date Tika has not emphasized structured data, although they do
>> >> extract content from Excel and OpenOffice.
>> >>>>>
>> >>>>>    I am intrigued.
>> >>>>>
>> >>>>>    Regards,
>> >>>>>    Dave
>> >>>>>
>> >>>>>    On Jul 24, 2017, at 10:55 AM, McHenry, Kenton Guadron <
>> >> mchenry@illinois.edu<ma...@illinois.edu>> wrote:
>> >>>>>
>> >>>>>    Yes, DFDL and its open source implementation Daffodil are more
>> >> about file formats and getting access to the entirety of a file's contents
>> >> in a consistent way through machine readable specifications.  The work has
>> >> implications in the area of digital preservation allowing one to preserve
>> >> these machine readable specifications rather than all the tools needed to
>> >> open/save a file in order to work with it.  Imagine someone developing
>> >> graphics software to work with 3D models and not having to worry about the
>> >> hundreds of formats out there for 3D meshes (whether there are tools for
>> >> opening the files and whether they can get access to those tools, whether
>> >> the spec is available and worrying about how complex that spec is to
>> >> implement, etc.), and simply building their code around the contents (e.g.
>> >> vertices, faces, etc.).  One could come up with similar scenarios for other
>> >> data types (documents, images, videos, audio, depth data, numeric data).
>> >> Ideally tools built supporting DFDL, could someday, support any format for
>> >> that type without the developer having to worry about the details of how
>> >> that data is represented within a file.
>> >>>>>
>> >>>>>    Kenton McHenry, Ph.D.
>> >>>>>    Principal Research Scientist, Adjunct Assistant Professor of
>> >> Computer Science
>> >>>>>    Deputy Director of the Scientific Software & Applications Division
>> >>>>>    National Center for Supercomputing Applications, University of
>> >> Illinois at Urbana-Champaign
>> >>>>>
>> >>>>>    On Jul 24, 2017, at 10:30 AM, Steve Lawrence <
>> >> stephen.d.lawrence@gmail.com<ma...@gmail.com><mailto:
>> >> stephen.d.lawrence@gmail.com>> wrote:
>> >>>>>
>> >>>>>    I'll preface this saying that I don't have a ton of experience with
>> >>>>>    Apache Tika. But based on my understanding, Tika and Daffodil do
>> >> have
>> >>>>>    somewhat similar goals, but reach them in different ways. For
>> >> example,
>> >>>>>    Tika requires that one writes /code/ to perform data extraction,
>> >> usually
>> >>>>>    relying on existing Java libraries to extract the desired metadata.
>> >> The
>> >>>>>    downside to this is that code can be buggy, and libraries might not
>> >> even
>> >>>>>    exist for formats of interest (especially common with legacy and
>> >>>>>    military data).
>> >>>>>
>> >>>>>    Daffodil, on the other hand, does not require one to write any code.
>> >>>>>    Instead, one writes a DFDL Schema (similar to XML Schema, with DFDL
>> >>>>>    annotations) that fully describes the data, which Daffodil then
>> >> uses to
>> >>>>>    convert the data to XML/JSON for extraction. So adding support for
>> >> a new
>> >>>>>    format means writing a new schema rather than new code. And less
>> >> code
>> >>>>>    generally means less bugs. Also, for secure systems that require
>> >>>>>    certification, generally speaking, it is easier to certify a schema
>> >> as
>> >>>>>    compared to code.
>> >>>>>
>> >>>>>    We certainly don't believe that Daffodil could replace Tika, but it
>> >> does
>> >>>>>    have the potential to add new functionality to Tika for formats
>> >> that do
>> >>>>>    not have existing libraries. One of our goals is to look into
>> >>>>>    integrating Daffodil support into tools like Tika. We'd love to hear
>> >>>>>    from Tika devs if this is something they'd be interested in.
>> >>>>>
>> >>>>>    I'll also add that whereas Tika tends to focus primarily on
>> >> metadata,
>> >>>>>    DFDL schemas usually describe an entire file format down to the
>> >> byte, so
>> >>>>>    one can extract more than just meta data, including text and binary
>> >>>>>    data. Further differentiating, Daffodil has support for serializing
>> >> data
>> >>>>>    (called unparse) from the XML/JSON representation, allowing one to
>> >>>>>    transform or filter data as well. We don't believe this feature is
>> >> all
>> >>>>>    that applicable to Tika, but may be useful to other technologies
>> >> such as
>> >>>>>    filtering or data fuzzing technologies.
>> >>>>>
>> >>>>>    - Steve
>> >>>>>
>> >>>>>
>> >>>>>    On 07/24/2017 10:59 AM, Mike Drob wrote:
>> >>>>>    What is the relationship between Daffodil and something like Apache
>> >> Tika's
>> >>>>>    extraction engine?
>> >>>>>
>> >>>>>    On Mon, Jul 24, 2017 at 9:53 AM, Steve Lawrence <
>> >>>>>    stephen.d.lawrence@gmail.com<mailto:stephen.d.lawrence@gmail.com
>> >>> <ma...@gmail.com>> wrote:
>> >>>>>
>> >>>>>    Dear Apache Incubator Community,
>> >>>>>
>> >>>>>    We would like to start a discussion around a proposal to bring
>> >> Daffodil
>> >>>>>    into the Apache Incubator. Daffodil is a implementation of the DFDL
>> >>>>>    specification used to convert between fixed format data and
>> >> XML/JSON.
>> >>>>>
>> >>>>>    The draft proposal can be found in the wiki at the following URL:
>> >>>>>
>> >>>>>    https://wiki.apache.org/incubator/DaffodilProposal
>> >>>>>
>> >>>>>    We do not yet have a champion or mentors, but it was recommended
>> >> that we
>> >>>>>    create a proposal and send it to this list to potentially find those
>> >>>>>    that might be interested. The text for the draft proposal is found
>> >>>>>    below. We look forward to your input.
>> >>>>>
>> >>>>>    Thanks,
>> >>>>>    -Steve
>> >>>>>
>> >>>>>
>> >>>>>    = Daffodil Proposal =
>> >>>>>
>> >>>>>    == Abstract ==
>> >>>>>
>> >>>>>    Daffodil is an implementation of the Data Format Description
>> >> Language
>> >>>>>    (DFDL) used to convert between fixed format data and XML/JSON.
>> >>>>>
>> >>>>>    == Proposal ==
>> >>>>>
>> >>>>>    The Data Format Description Language (DFDL) is a specification,
>> >>>>>    developed by the Open Grid Forum, capable of describing many data
>> >>>>>    formats, including both textual and binary, scientific and numeric,
>> >>>>>    legacy and modern, commercial record-oriented, and many industry and
>> >>>>>    military standards. It defines a language that is a subset of W3C
>> >> XML
>> >>>>>    schema to describe the logical format of the data, and annotations
>> >>>>>    within the schema to describe the physical representation.
>> >>>>>
>> >>>>>    Daffodil is an open source implementation of the DFDL specification
>> >> that
>> >>>>>    uses these DFDL schemas to parse fixed format data into an infoset,
>> >>>>>    which is most commonly represented as either XML or JSON. This
>> >> allows
>> >>>>>    the use of well-established XML or JSON technologies and libraries
>> >> to
>> >>>>>    consume, inspect, and manipulate fixed format data in existing
>> >>>>>    solutions. Daffodil is also capable of the reverse by serializing or
>> >>>>>    "unparsing" an XML or JSON infoset back to the original data format.
>> >>>>>
>> >>>>>    == Background ==
>> >>>>>
>> >>>>>    Many different software solutions need to consume and manage data,
>> >>>>>    including data directed routing, databases, data analysis, data
>> >>>>>    cleansing, data visualizing, and more. A key aspect of such
>> >> solutions is
>> >>>>>    the need to transform the data into an easily consumable format.
>> >>>>>    Usually, this means that for each unique data format, one develops a
>> >>>>>    tool that can read and extract the necessary information, often
>> >> leading
>> >>>>>    to ad-hoc and data-format-specific description systems. Such
>> >> systems are
>> >>>>>    often proprietary, not well tested, and incompatible, leading to
>> >> vendor
>> >>>>>    lock-in, flawed software, and increased training costs. DFDL is a
>> >> new
>> >>>>>    standard, with version 1.0 completed in October of 2016, that solves
>> >>>>>    these problems by defining an open standard to describe many
>> >> different
>> >>>>>    data formats and how to parse and unparse between the data and
>> >> XML/JSON.
>> >>>>>
>> >>>>>    Two closed source implementations of DFDL currently exist. The
>> >> first was
>> >>>>>    created by IBM and is now part of their IBM® Integration Bus
>> >> product.
>> >>>>>    The second was created by the European Space Agency, called DFDL4S
>> >> or
>> >>>>>    "DFDL for Space" targeted at the challenges of their satellite data
>> >>>>>    processing.
>> >>>>>
>> >>>>>    Around 2005, Pacific Northwest National Lab created Defuddle, built
>> >> as
>> >>>>>    an open source implementation and proof of concept of the draft DFDL
>> >>>>>    specification and a test bed to feed new concepts into specification
>> >>>>>    development. Primary development of Defuddle was eventually taken
>> >> over
>> >>>>>    by the National Center for Supercomputing Applications (NCSA).
>> >> However,
>> >>>>>    due to evolution of the DFDL specification and architectural and
>> >>>>>    performance issues with Defuddle, around 2009, NCSA restarted the
>> >>>>>    project with the new name of Daffodil, with a goal of implementing
>> >> the
>> >>>>>    complete DFDL specification. Daffodil development continued at NCSA
>> >>>>>    until around 2012, at which point development slowed due to budget
>> >>>>>    limitations. Shortly thereafter, primary development was picked up
>> >> by
>> >>>>>    Tresys Technology where it continues today, with contributions from
>> >>>>>    other entities such as the Navy Research Lab, the Air Force Research
>> >>>>>    Lab, MITRE, and Booz Allen Hamilton. In February of 2015, Daffodil
>> >>>>>    version 1.0.0 was released, including support for the DFDL features
>> >>>>>    needed to parse many common file formats. Daffodil version 2.0.0 is
>> >>>>>    expected to be released in August of 2017, which will include
>> >> unparse
>> >>>>>    support with one-to-one parsing feature parity.
>> >>>>>
>> >>>>>    Entities including IBM, MITRE, NATO NCI Agency, Northrop-Grumman,
>> >> Quark
>> >>>>>    Security, Raytheon, and Tresys Technology have developed DFDL
>> >> schemas
>> >>>>>    for many data formats from varying technology domains, including
>> >> PNG,
>> >>>>>    GIF, BMP, PCAP, HL7, EDIFACT, NACHA, vCard, iCalendar, and
>> >> MIL-STD-2045,
>> >>>>>    many of which are publicly available on the DFDL Schemas github.
>> >> There
>> >>>>>    are also a number of military-application data formats, the
>> >>>>>    specifications of which are not public, which have historically been
>> >>>>>    very difficult and expensive to process, and for which DFDL schemas
>> >> have
>> >>>>>    been created or are actively in development; these include
>> >>>>>    MIL-STD-6040/USMTF ATO, MIL-STD-6017/VMF, MIL-STD-6016/NATO STANAG
>> >> 5516
>> >>>>>    (aka "Link16").
>> >>>>>
>> >>>>>    == Rationale ==
>> >>>>>
>> >>>>>    Numerous software solutions exist that consume, inspect, analyze,
>> >> and
>> >>>>>    transform data, many of which can be found in the Apache Software
>> >>>>>    Foundation (ASF). In order for tools like these to consume new
>> >> types of
>> >>>>>    data, custom extensions are usually required, often with high
>> >>>>>    development and testing costs. Daffodil fills a clear gap in many of
>> >>>>>    these solutions, providing a simple and low cost way to transform
>> >> data
>> >>>>>    to XML or JSON, which many of these tools natively support already.
>> >> With
>> >>>>>    the upcoming 2.0.0 release, the Daffodil project will have achieved
>> >> a
>> >>>>>    level of functionality in both parse and unparse that, when
>> >> integrated
>> >>>>>    into existing solutions, could provide for a new method to quickly
>> >>>>>    enable support for new data formats.
>> >>>>>
>> >>>>>    == Initial Goals ==
>> >>>>>
>> >>>>>    * Relicense the existing code from the University of Illinois/NCSA
>> >> Open
>> >>>>>    Source License to the Apache License version 2.0, working with
>> >> Apache
>> >>>>>    Legal to ensure correctness, and with Daffodil contributors to get
>> >>>>>    their permission.
>> >>>>>    * Move the existing codebase, documentation, bugs, and mailing
>> >> lists to
>> >>>>>    the Apache hosted infrastructure
>> >>>>>    * Establish a formal release process and schedule, allowing for
>> >>>>>    dependable release cycles in a manner consistent with the Apache
>> >>>>>    development process.
>> >>>>>    * Build relationships with ASF projects to add Daffodil support
>> >> where
>> >>>>>    appropriate
>> >>>>>    * Grow the community to establish a diversity of background and
>> >> expertise.
>> >>>>>
>> >>>>>    == Current Status ==
>> >>>>>
>> >>>>>    === Meritocracy ===
>> >>>>>
>> >>>>>    All initial committers are familiar with the principles of
>> >> meritocracy.
>> >>>>>    The Daffodil project has followed the model of meritocracy in the
>> >> past,
>> >>>>>    providing multiple outside entities commit access based on the
>> >> quality
>> >>>>>    of their contributions. In order to grow the Daffodil user base and
>> >>>>>    development community, we are dedicated to continuing to operate
>> >>>>>    Daffodil as a meritocracy.
>> >>>>>
>> >>>>>    A key ingredient in a meritocracy of developers is open group code
>> >>>>>    review. The Daffodil project has operated in this mode throughout
>> >> its
>> >>>>>    existence and this provides a forum to improve the code, verify code
>> >>>>>    quality, and educate new developers on the code base.
>> >>>>>
>> >>>>>    === Community ===
>> >>>>>
>> >>>>>    Daffodil has a small community of users and developers. Although
>> >> primary
>> >>>>>    Daffodil development is done by Tresys Technology, a handful of
>> >> other
>> >>>>>    contributions have come from other entities including the Navy
>> >> Research
>> >>>>>    Lab, the Air Force Research Lab, MITRE, and Booz Allen Hamilton. In
>> >>>>>    addition to developers, multiple users of Daffodil have created DFDL
>> >>>>>    schemas, including entities such as MITRE, IBM, Raytheon, Quark
>> >>>>>    Security, and Tresys Technology. The DFDL Schemas github community
>> >> has
>> >>>>>    been created as a place for DFDL schemas to be published. The
>> >> Daffodil
>> >>>>>    project also makes use of mailing lists, !HipChat, and Confluence
>> >>>>>    Questions to build a community of users and system for support.
>> >>>>>
>> >>>>>    === Core Developers ===
>> >>>>>
>> >>>>>    The core developers of Daffodil are employed by Tresys Technology.
>> >> We
>> >>>>>    will work to grow the community among a more diverse set of
>> >> developers
>> >>>>>    and industries.
>> >>>>>
>> >>>>>    === Alignment ===
>> >>>>>
>> >>>>>    Daffodil was created as an open source project with a philosophy
>> >>>>>    consistent with The Apache Way. A strong belief in meritocracy,
>> >>>>>    community involvement in decisions, openness, and ensuring a high
>> >> level
>> >>>>>    of quality in code, documentation, and testing are some of our
>> >> shared
>> >>>>>    core beliefs.
>> >>>>>
>> >>>>>    Further, as mentioned in the Rationale section, Daffodil fills a gap
>> >>>>>    that exists in many ASF projects, including !NiFi, Spark, Storm,
>> >> Hadoop,
>> >>>>>    Tika, and others. In order for tools like these to consume new
>> >> types of
>> >>>>>    data, custom extensions are usually required. Rather than create
>> >> such
>> >>>>>    extensions, Daffodil provides an easy and standards-compliant way to
>> >>>>>    transform data to XML or JSON, which many of these tools already
>> >>>>>    natively support.
>> >>>>>
>> >>>>>    == Known Risks ==
>> >>>>>
>> >>>>>    === Orphaned Products ===
>> >>>>>
>> >>>>>    The current core developers are the leading contributors in the
>> >> space of
>> >>>>>    DFDL and wish to see it flourish. Though there is some risk that the
>> >>>>>    initial committers all come from the same company, a goal of
>> >> entering
>> >>>>>    into incubation is to grow the development community to minimize the
>> >>>>>    risk of reliance on a single company.
>> >>>>>
>> >>>>>    === Inexperience with Open Source ===
>> >>>>>
>> >>>>>    The Daffodil project began as an open source project and has
>> >> continued
>> >>>>>    that model throughout development. This includes public bug
>> >> tracking,
>> >>>>>    git revision control, automated builds and tests, and a public wiki
>> >> for
>> >>>>>    documentation.
>> >>>>>
>> >>>>>    Additionally, the current core developers and initial committers all
>> >>>>>    work for a company that relies on, believes in, promotes, and has
>> >> led or
>> >>>>>    contributed to many open source software projects, including SELinux
>> >>>>>    Userspace, OpenSCAP, CLIP, refpolicy, setools, RPM, and others. As
>> >> such,
>> >>>>>    there is low risk related to inexperience with open source software
>> >> and
>> >>>>>    processes.
>> >>>>>
>> >>>>>    === Homogeneous Developers ===
>> >>>>>
>> >>>>>    The proposed initial committers come from a single entity, though
>> >> we are
>> >>>>>    committed to growing the Daffodil development community to include a
>> >>>>>    broad group of additional committers from a wide array of
>> >> industries.
>> >>>>>
>> >>>>>    === Reliance on Salaried Developers ===
>> >>>>>
>> >>>>>    The proposed initial committers are paid by their employer to
>> >> contribute
>> >>>>>    to the Daffodil project. We expect that Daffodil development will
>> >>>>>    continue with salaried developers, and are committed to growing the
>> >>>>>    community to include non-salaried developers as well.
>> >>>>>
>> >>>>>    === Relationship with other Apache Projects ===
>> >>>>>
>> >>>>>    As mentioned in the Alignment section, Daffodil fills a clear gap in
>> >>>>>    numerous other ASF projects that consume and manage large amounts
>> >> of data.
>> >>>>>
>> >>>>>    As a specific example, Daffodil developers have created a Daffodil
>> >>>>>    Apache !NiFi Processor, currently in use in data transfer solutions,
>> >>>>>    which allows one to ingest non-native data into an Apache !NiFi
>> >> pipeline
>> >>>>>    as XML or JSON. This processor was well received by the Apache !NiFi
>> >>>>>    developers, with positive comments about the concise API and how it
>> >>>>>    could handle non-native data. Daffodil developers have also
>> >> successfully
>> >>>>>    prototyped integration with Apache Spark. We believe Daffodil could
>> >>>>>    provide a strong benefit to many other ASF projects that handle
>> >> fixed
>> >>>>>    format data. We anticipate working closely with such ASF projects to
>> >>>>>    include Daffodil where applicable to increase their ability to
>> >> support
>> >>>>>    new data formats with minimal effort.
>> >>>>>
>> >>>>>    Daffodil also depends on existing ASF projects, including Apache
>> >> Commons
>> >>>>>    and Apache Xerces.
>> >>>>>
>> >>>>>    === An Excessive Fascination with the Apache Brand ===
>> >>>>>
>> >>>>>    Although the Apache brand may certainly help to attract more
>> >>>>>    contributors, publicity is not the reason for this proposal. We
>> >> believe
>> >>>>>    Daffodil could provide a great benefit to the ASF and the numerous
>> >> data
>> >>>>>    focused projects that comprise it, as described in the Rationale and
>> >>>>>    Alignment sections. We hope to build a strong and vibrant community
>> >>>>>    built around The Apache Way, and not dependent on a single company.
>> >>>>>
>> >>>>>    === Documentation ===
>> >>>>>
>> >>>>>    Daffodil documentation can be found at:
>> >>>>>
>> >>>>>    *
>> >>>>>    https://opensource.ncsa.illinois.edu/confluence/
>> >>>>>    display/DFDL/Daffodil%3A+Open+Source+DFDL
>> >>>>>
>> >>>>>    Information about DFDL can be found at:
>> >>>>>
>> >>>>>    * https://www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
>> >>>>>    *
>> >>>>>    https://www.ibm.com/support/knowledgecenter/en/SSMKHH_9.0.
>> >>>>>    0/com.ibm.etools.mft.doc/df20060_.htm
>> >>>>>
>> >>>>>    Public examples of DFDL Schemas can be found at:
>> >>>>>
>> >>>>>    * https://github.com/DFDLSchemas
>> >>>>>
>> >>>>>    == Initial Source ==
>> >>>>>
>> >>>>>    The Daffodil git repo goes back to mid-2011 with approximately 20
>> >>>>>    different contributors and feedback from many users and developers.
>> >> The
>> >>>>>    core codebase is written in Scala and includes both a Scala and Java
>> >>>>>    API, along with Javadocs and Scaladocs for API usage. The initial
>> >> code
>> >>>>>    will come from the git repository currently hosted by NCSA at the
>> >>>>>    University of Illinois :
>> >>>>>
>> >>>>>    https://opensource.ncsa.illinois.edu/bitbucket/
>> >>>>>    projects/DFDL/repos/daffodil/
>> >>>>>
>> >>>>>    == Source and Intellectual Property Submission ==
>> >>>>>
>> >>>>>    The complete Daffodil code is licensed under the University of
>> >>>>>    Illinois/NCSA Open Source License. Much of the current codebase has
>> >> been
>> >>>>>    developed by Tresys Technology, who is open to relicensing the code
>> >> to
>> >>>>>    the Apache License version 2.0 and donate the source to the ASF.
>> >>>>>    Contacts at NCSA are also open to relicensing their contributions to
>> >>>>>    Apache v2. We plan to contact the other contributors and ask for
>> >>>>>    permission to relicense and donate their contributed code. For those
>> >>>>>    that decline or we cannot contact, their code will be removed or
>> >>>>>    replaced. We will work closely with Apache Legal to ensure all
>> >> issues
>> >>>>>    related to relicensing are acceptable.
>> >>>>>
>> >>>>>    == External Dependencies ==
>> >>>>>
>> >>>>>    We believe all current dependencies are compatible with the ASF
>> >>>>>    guidelines. Our dependency licenses come from the following license
>> >>>>>    styles: Apache v2, BSD, MIT, and ICU. The list of current Daffodil
>> >>>>>    dependencies and their licenses are documented here:
>> >>>>>
>> >>>>>    https://opensource.ncsa.illinois.edu/confluence/
>> >>>>>    display/DFDL/Dependencies+and+Licenses
>> >>>>>
>> >>>>>    == Cryptography ==
>> >>>>>
>> >>>>>    None
>> >>>>>
>> >>>>>    == Required Resources ==
>> >>>>>
>> >>>>>    === Mailing Lists ===
>> >>>>>
>> >>>>>    * commits@daffodil.incubator.apache.org
>> >>>>>    * dev@daffodil.incubator.apache.org
>> >>>>>    * private@daffodil.incubator.apache.org
>> >>>>>    * user@daffodil.incubator.apache.org
>> >>>>>
>> >>>>>    === Source Control ===
>> >>>>>
>> >>>>>    git://git.apache.org/incubator-daffodil.git
>> >>>>>
>> >>>>>    === Issue Tracking ===
>> >>>>>
>> >>>>>    JIRA Daffodil (DFDL)
>> >>>>>
>> >>>>>    === Initial Committers ===
>> >>>>>
>> >>>>>    * Beth Finnegan <efinnegan at tresys dot com>
>> >>>>>    * Dave Thompson <dthompson at tresys dot com>
>> >>>>>    * Josh Adams <jadams at tresys dot com>
>> >>>>>    * Mike Beckerle <mbeckerle at tresys dot com>
>> >>>>>    * Steve Lawrence <slawrence at tresys dot com>
>> >>>>>    * Taylor Wise <twise at tresys dot com>
>> >>>>>
>> >>>>>    === Affiliations ===
>> >>>>>
>> >>>>>    * Beth Finnegan (Tresys Technology)
>> >>>>>    * Dave Thompson (Tresys Technology)
>> >>>>>    * Josh Adams (Tresys Technology)
>> >>>>>    * Mike Beckerle (Tresys Technology)
>> >>>>>    * Steve Lawrence (Tresys Technology)
>> >>>>>    * Taylor Wise (Tresys Technology)
>> >>>>>
>> >>>>>    == Sponsors ==
>> >>>>>
>> >>>>>    === Champion ===
>> >>>>>
>> >>>>>    * TBD
>> >>>>>
>> >>>>>    === Nominated Mentors ===
>> >>>>>
>> >>>>>    * TBD
>> >>>>>
>> >>>>>    === Sponsoring Entity ===
>> >>>>>
>> >>>>>    We request the Apache Incubator to sponsor this project.
>> >>>>>
>> >>>>>
>> >> ---------------------------------------------------------------------
>> >>>>>    To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> >>>>>    For additional commands, e-mail: general-help@incubator.apache.org
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >> ---------------------------------------------------------------------
>> >>>>>    To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> >> <ma...@incubator.apache.org>
>> >>>>>    For additional commands, e-mail: general-help@incubator.apache.org
>> >> <ma...@incubator.apache.org>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> ---------------------------------------------------------------------
>> >>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> >>>>> For additional commands, e-mail: general-help@incubator.apache.org
>> >>>>>
>> >>>>
>> >>>> ---------------------------------------------------------------------
>> >>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> >>>> For additional commands, e-mail: general-help@incubator.apache.org
>> >>>
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> >> For additional commands, e-mail: general-help@incubator.apache.org
>> >>
>> >
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>> 

Re: [DISCUSS] Daffodil Incubation Proposal

Posted by "John D. Ament" <jo...@apache.org>.
Steve,

At this point, I'd recommend we wrap the discussion and call for a vote.
While ideally we want 3 mentors, we can get started with 2 and see how
things progress.

John

On Wed, Aug 2, 2017 at 3:55 PM Steve Lawrence <st...@gmail.com>
wrote:

> Thanks John!
>
> On 08/02/2017 03:23 PM, John D. Ament wrote:
> > You can also count me in as a mentor.
> >
> > John
> >
> > On Wed, Aug 2, 2017 at 3:14 PM Steve Lawrence <
> stephen.d.lawrence@gmail.com>
> > wrote:
> >
> >> Understood. Thanks for the interest!
> >>
> >> - Steve
> >>
> >> On 08/02/2017 02:57 PM, Dave Fisher wrote:
> >>> Hi Steve,
> >>>
> >>> It was not so much the lack of committers as it was the current
> >> diversity. That is not a blocker for entry to Incubation.
> >>>
> >>> I am willing to be one of the Mentors. Once there are at least two more
> >> we can push forward.
> >>>
> >>> Regards,
> >>> Dave
> >>>
> >>>> On Aug 1, 2017, at 5:09 AM, Steve Lawrence <
> >> stephen.d.lawrence@gmail.com> wrote:
> >>>>
> >>>> Discussions have died down, and I think the consensus from the
> responses
> >>>> is that the issues are 1) the lack of committers and 2) the lack of a
> >>>> champion and mentors. We hope to address #1 and grow the community as
> >>>> part of incubation. Is anyone interested in being a champion or mentor
> >>>> and help us with #2?
> >>>>
> >>>> Thanks,
> >>>> - Steve
> >>>>
> >>>> On 07/26/2017 04:06 PM, Chris Mattmann wrote:
> >>>>> This sounds like a very interesting project.
> >>>>>
> >>>>> I don’t have the time to mentor at the moment but I will keep a close
> >> eye on it.
> >>>>>
> >>>>> Cheers,
> >>>>> Chris Mattmann
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 7/25/17, 11:53 AM, "McHenry, Kenton Guadron" <
> mchenry@illinois.edu>
> >> wrote:
> >>>>>
> >>>>>    Hi Dave,
> >>>>>
> >>>>>    The developers that were at NCSA have moved on to other
> >> organizations.  While we still leverage Daffodil and are very much
> >> interested in seeing it move forward, development is currently done by
> the
> >> Tresys team.  Agreed on the synergy with Tika.
> >>>>>
> >>>>>    Kenton McHenry, Ph.D.
> >>>>>    Principal Research Scientist, Adjunct Assistant Professor of
> >> Computer Science
> >>>>>    Deputy Director of the Scientific Software & Applications Division
> >>>>>    National Center for Supercomputing Applications, University of
> >> Illinois at Urbana-Champaign
> >>>>>
> >>>>>    On Jul 24, 2017, at 1:55 PM, Dave Fisher <dave2wave@comcast.net
> >> <ma...@comcast.net>> wrote:
> >>>>>
> >>>>>    Hi Kenton,
> >>>>>
> >>>>>    Is there any reason that you and others from the NCSA are not
> >> Initial Committers? That would make this proposal stronger.
> >>>>>
> >>>>>    Regarding Apache Tika - it relies on other projects including
> >> Apache POI and Apache PDFBox. They are pragmatic about what is used. If
> >> Daffodil works to expand then I think that there would be good synergy
> >> between the projects. I know as a POI PMC member that the POI community
> has
> >> significantly benefited from the Tika community some of whom are from
> Mitre.
> >>>>>
> >>>>>    To date Tika has not emphasized structured data, although they do
> >> extract content from Excel and OpenOffice.
> >>>>>
> >>>>>    I am intrigued.
> >>>>>
> >>>>>    Regards,
> >>>>>    Dave
> >>>>>
> >>>>>    On Jul 24, 2017, at 10:55 AM, McHenry, Kenton Guadron <
> >> mchenry@illinois.edu<ma...@illinois.edu>> wrote:
> >>>>>
> >>>>>    Yes, DFDL and its open source implementation Daffodil are more
> >> about file formats and getting access to the entirety of a file's
> contents
> >> in a consistent way through machine readable specifications.  The work
> has
> >> implications in the area of digital preservation allowing one to
> preserve
> >> these machine readable specifications rather than all the tools needed
> to
> >> open/save a file in order to work with it.  Imagine someone developing
> >> graphics software to work with 3D models and not having to worry about
> the
> >> hundreds of formats out there for 3D meshes (whether there are tools for
> >> opening the files and whether they can get access to those tools,
> whether
> >> the spec is available and worrying about how complex that spec is to
> >> implement, etc.), and simply building their code around the contents
> (e.g.
> >> vertices, faces, etc.).  One could come up with similar scenarios for
> other
> >> data types (documents, images, videos, audio, depth data, numeric data).
> >> Ideally tools built supporting DFDL, could someday, support any format
> for
> >> that type without the developer having to worry about the details of how
> >> that data is represented within a file.
> >>>>>
> >>>>>    Kenton McHenry, Ph.D.
> >>>>>    Principal Research Scientist, Adjunct Assistant Professor of
> >> Computer Science
> >>>>>    Deputy Director of the Scientific Software & Applications Division
> >>>>>    National Center for Supercomputing Applications, University of
> >> Illinois at Urbana-Champaign
> >>>>>
> >>>>>    On Jul 24, 2017, at 10:30 AM, Steve Lawrence <
> >> stephen.d.lawrence@gmail.com<mailto:stephen.d.lawrence@gmail.com
> ><mailto:
> >> stephen.d.lawrence@gmail.com>> wrote:
> >>>>>
> >>>>>    I'll preface this saying that I don't have a ton of experience
> with
> >>>>>    Apache Tika. But based on my understanding, Tika and Daffodil do
> >> have
> >>>>>    somewhat similar goals, but reach them in different ways. For
> >> example,
> >>>>>    Tika requires that one writes /code/ to perform data extraction,
> >> usually
> >>>>>    relying on existing Java libraries to extract the desired
> metadata.
> >> The
> >>>>>    downside to this is that code can be buggy, and libraries might
> not
> >> even
> >>>>>    exist for formats of interest (especially common with legacy and
> >>>>>    military data).
> >>>>>
> >>>>>    Daffodil, on the other hand, does not require one to write any
> code.
> >>>>>    Instead, one writes a DFDL Schema (similar to XML Schema, with
> DFDL
> >>>>>    annotations) that fully describes the data, which Daffodil then
> >> uses to
> >>>>>    convert the data to XML/JSON for extraction. So adding support for
> >> a new
> >>>>>    format means writing a new schema rather than new code. And less
> >> code
> >>>>>    generally means less bugs. Also, for secure systems that require
> >>>>>    certification, generally speaking, it is easier to certify a
> schema
> >> as
> >>>>>    compared to code.
> >>>>>
> >>>>>    We certainly don't believe that Daffodil could replace Tika, but
> it
> >> does
> >>>>>    have the potential to add new functionality to Tika for formats
> >> that do
> >>>>>    not have existing libraries. One of our goals is to look into
> >>>>>    integrating Daffodil support into tools like Tika. We'd love to
> hear
> >>>>>    from Tika devs if this is something they'd be interested in.
> >>>>>
> >>>>>    I'll also add that whereas Tika tends to focus primarily on
> >> metadata,
> >>>>>    DFDL schemas usually describe an entire file format down to the
> >> byte, so
> >>>>>    one can extract more than just meta data, including text and
> binary
> >>>>>    data. Further differentiating, Daffodil has support for
> serializing
> >> data
> >>>>>    (called unparse) from the XML/JSON representation, allowing one to
> >>>>>    transform or filter data as well. We don't believe this feature is
> >> all
> >>>>>    that applicable to Tika, but may be useful to other technologies
> >> such as
> >>>>>    filtering or data fuzzing technologies.
> >>>>>
> >>>>>    - Steve
> >>>>>
> >>>>>
> >>>>>    On 07/24/2017 10:59 AM, Mike Drob wrote:
> >>>>>    What is the relationship between Daffodil and something like
> Apache
> >> Tika's
> >>>>>    extraction engine?
> >>>>>
> >>>>>    On Mon, Jul 24, 2017 at 9:53 AM, Steve Lawrence <
> >>>>>    stephen.d.lawrence@gmail.com<mailto:stephen.d.lawrence@gmail.com
> >>> <ma...@gmail.com>> wrote:
> >>>>>
> >>>>>    Dear Apache Incubator Community,
> >>>>>
> >>>>>    We would like to start a discussion around a proposal to bring
> >> Daffodil
> >>>>>    into the Apache Incubator. Daffodil is a implementation of the
> DFDL
> >>>>>    specification used to convert between fixed format data and
> >> XML/JSON.
> >>>>>
> >>>>>    The draft proposal can be found in the wiki at the following URL:
> >>>>>
> >>>>>    https://wiki.apache.org/incubator/DaffodilProposal
> >>>>>
> >>>>>    We do not yet have a champion or mentors, but it was recommended
> >> that we
> >>>>>    create a proposal and send it to this list to potentially find
> those
> >>>>>    that might be interested. The text for the draft proposal is found
> >>>>>    below. We look forward to your input.
> >>>>>
> >>>>>    Thanks,
> >>>>>    -Steve
> >>>>>
> >>>>>
> >>>>>    = Daffodil Proposal =
> >>>>>
> >>>>>    == Abstract ==
> >>>>>
> >>>>>    Daffodil is an implementation of the Data Format Description
> >> Language
> >>>>>    (DFDL) used to convert between fixed format data and XML/JSON.
> >>>>>
> >>>>>    == Proposal ==
> >>>>>
> >>>>>    The Data Format Description Language (DFDL) is a specification,
> >>>>>    developed by the Open Grid Forum, capable of describing many data
> >>>>>    formats, including both textual and binary, scientific and
> numeric,
> >>>>>    legacy and modern, commercial record-oriented, and many industry
> and
> >>>>>    military standards. It defines a language that is a subset of W3C
> >> XML
> >>>>>    schema to describe the logical format of the data, and annotations
> >>>>>    within the schema to describe the physical representation.
> >>>>>
> >>>>>    Daffodil is an open source implementation of the DFDL
> specification
> >> that
> >>>>>    uses these DFDL schemas to parse fixed format data into an
> infoset,
> >>>>>    which is most commonly represented as either XML or JSON. This
> >> allows
> >>>>>    the use of well-established XML or JSON technologies and libraries
> >> to
> >>>>>    consume, inspect, and manipulate fixed format data in existing
> >>>>>    solutions. Daffodil is also capable of the reverse by serializing
> or
> >>>>>    "unparsing" an XML or JSON infoset back to the original data
> format.
> >>>>>
> >>>>>    == Background ==
> >>>>>
> >>>>>    Many different software solutions need to consume and manage data,
> >>>>>    including data directed routing, databases, data analysis, data
> >>>>>    cleansing, data visualizing, and more. A key aspect of such
> >> solutions is
> >>>>>    the need to transform the data into an easily consumable format.
> >>>>>    Usually, this means that for each unique data format, one
> develops a
> >>>>>    tool that can read and extract the necessary information, often
> >> leading
> >>>>>    to ad-hoc and data-format-specific description systems. Such
> >> systems are
> >>>>>    often proprietary, not well tested, and incompatible, leading to
> >> vendor
> >>>>>    lock-in, flawed software, and increased training costs. DFDL is a
> >> new
> >>>>>    standard, with version 1.0 completed in October of 2016, that
> solves
> >>>>>    these problems by defining an open standard to describe many
> >> different
> >>>>>    data formats and how to parse and unparse between the data and
> >> XML/JSON.
> >>>>>
> >>>>>    Two closed source implementations of DFDL currently exist. The
> >> first was
> >>>>>    created by IBM and is now part of their IBM® Integration Bus
> >> product.
> >>>>>    The second was created by the European Space Agency, called DFDL4S
> >> or
> >>>>>    "DFDL for Space" targeted at the challenges of their satellite
> data
> >>>>>    processing.
> >>>>>
> >>>>>    Around 2005, Pacific Northwest National Lab created Defuddle,
> built
> >> as
> >>>>>    an open source implementation and proof of concept of the draft
> DFDL
> >>>>>    specification and a test bed to feed new concepts into
> specification
> >>>>>    development. Primary development of Defuddle was eventually taken
> >> over
> >>>>>    by the National Center for Supercomputing Applications (NCSA).
> >> However,
> >>>>>    due to evolution of the DFDL specification and architectural and
> >>>>>    performance issues with Defuddle, around 2009, NCSA restarted the
> >>>>>    project with the new name of Daffodil, with a goal of implementing
> >> the
> >>>>>    complete DFDL specification. Daffodil development continued at
> NCSA
> >>>>>    until around 2012, at which point development slowed due to budget
> >>>>>    limitations. Shortly thereafter, primary development was picked up
> >> by
> >>>>>    Tresys Technology where it continues today, with contributions
> from
> >>>>>    other entities such as the Navy Research Lab, the Air Force
> Research
> >>>>>    Lab, MITRE, and Booz Allen Hamilton. In February of 2015, Daffodil
> >>>>>    version 1.0.0 was released, including support for the DFDL
> features
> >>>>>    needed to parse many common file formats. Daffodil version 2.0.0
> is
> >>>>>    expected to be released in August of 2017, which will include
> >> unparse
> >>>>>    support with one-to-one parsing feature parity.
> >>>>>
> >>>>>    Entities including IBM, MITRE, NATO NCI Agency, Northrop-Grumman,
> >> Quark
> >>>>>    Security, Raytheon, and Tresys Technology have developed DFDL
> >> schemas
> >>>>>    for many data formats from varying technology domains, including
> >> PNG,
> >>>>>    GIF, BMP, PCAP, HL7, EDIFACT, NACHA, vCard, iCalendar, and
> >> MIL-STD-2045,
> >>>>>    many of which are publicly available on the DFDL Schemas github.
> >> There
> >>>>>    are also a number of military-application data formats, the
> >>>>>    specifications of which are not public, which have historically
> been
> >>>>>    very difficult and expensive to process, and for which DFDL
> schemas
> >> have
> >>>>>    been created or are actively in development; these include
> >>>>>    MIL-STD-6040/USMTF ATO, MIL-STD-6017/VMF, MIL-STD-6016/NATO STANAG
> >> 5516
> >>>>>    (aka "Link16").
> >>>>>
> >>>>>    == Rationale ==
> >>>>>
> >>>>>    Numerous software solutions exist that consume, inspect, analyze,
> >> and
> >>>>>    transform data, many of which can be found in the Apache Software
> >>>>>    Foundation (ASF). In order for tools like these to consume new
> >> types of
> >>>>>    data, custom extensions are usually required, often with high
> >>>>>    development and testing costs. Daffodil fills a clear gap in many
> of
> >>>>>    these solutions, providing a simple and low cost way to transform
> >> data
> >>>>>    to XML or JSON, which many of these tools natively support
> already.
> >> With
> >>>>>    the upcoming 2.0.0 release, the Daffodil project will have
> achieved
> >> a
> >>>>>    level of functionality in both parse and unparse that, when
> >> integrated
> >>>>>    into existing solutions, could provide for a new method to quickly
> >>>>>    enable support for new data formats.
> >>>>>
> >>>>>    == Initial Goals ==
> >>>>>
> >>>>>    * Relicense the existing code from the University of Illinois/NCSA
> >> Open
> >>>>>    Source License to the Apache License version 2.0, working with
> >> Apache
> >>>>>    Legal to ensure correctness, and with Daffodil contributors to get
> >>>>>    their permission.
> >>>>>    * Move the existing codebase, documentation, bugs, and mailing
> >> lists to
> >>>>>    the Apache hosted infrastructure
> >>>>>    * Establish a formal release process and schedule, allowing for
> >>>>>    dependable release cycles in a manner consistent with the Apache
> >>>>>    development process.
> >>>>>    * Build relationships with ASF projects to add Daffodil support
> >> where
> >>>>>    appropriate
> >>>>>    * Grow the community to establish a diversity of background and
> >> expertise.
> >>>>>
> >>>>>    == Current Status ==
> >>>>>
> >>>>>    === Meritocracy ===
> >>>>>
> >>>>>    All initial committers are familiar with the principles of
> >> meritocracy.
> >>>>>    The Daffodil project has followed the model of meritocracy in the
> >> past,
> >>>>>    providing multiple outside entities commit access based on the
> >> quality
> >>>>>    of their contributions. In order to grow the Daffodil user base
> and
> >>>>>    development community, we are dedicated to continuing to operate
> >>>>>    Daffodil as a meritocracy.
> >>>>>
> >>>>>    A key ingredient in a meritocracy of developers is open group code
> >>>>>    review. The Daffodil project has operated in this mode throughout
> >> its
> >>>>>    existence and this provides a forum to improve the code, verify
> code
> >>>>>    quality, and educate new developers on the code base.
> >>>>>
> >>>>>    === Community ===
> >>>>>
> >>>>>    Daffodil has a small community of users and developers. Although
> >> primary
> >>>>>    Daffodil development is done by Tresys Technology, a handful of
> >> other
> >>>>>    contributions have come from other entities including the Navy
> >> Research
> >>>>>    Lab, the Air Force Research Lab, MITRE, and Booz Allen Hamilton.
> In
> >>>>>    addition to developers, multiple users of Daffodil have created
> DFDL
> >>>>>    schemas, including entities such as MITRE, IBM, Raytheon, Quark
> >>>>>    Security, and Tresys Technology. The DFDL Schemas github community
> >> has
> >>>>>    been created as a place for DFDL schemas to be published. The
> >> Daffodil
> >>>>>    project also makes use of mailing lists, !HipChat, and Confluence
> >>>>>    Questions to build a community of users and system for support.
> >>>>>
> >>>>>    === Core Developers ===
> >>>>>
> >>>>>    The core developers of Daffodil are employed by Tresys Technology.
> >> We
> >>>>>    will work to grow the community among a more diverse set of
> >> developers
> >>>>>    and industries.
> >>>>>
> >>>>>    === Alignment ===
> >>>>>
> >>>>>    Daffodil was created as an open source project with a philosophy
> >>>>>    consistent with The Apache Way. A strong belief in meritocracy,
> >>>>>    community involvement in decisions, openness, and ensuring a high
> >> level
> >>>>>    of quality in code, documentation, and testing are some of our
> >> shared
> >>>>>    core beliefs.
> >>>>>
> >>>>>    Further, as mentioned in the Rationale section, Daffodil fills a
> gap
> >>>>>    that exists in many ASF projects, including !NiFi, Spark, Storm,
> >> Hadoop,
> >>>>>    Tika, and others. In order for tools like these to consume new
> >> types of
> >>>>>    data, custom extensions are usually required. Rather than create
> >> such
> >>>>>    extensions, Daffodil provides an easy and standards-compliant way
> to
> >>>>>    transform data to XML or JSON, which many of these tools already
> >>>>>    natively support.
> >>>>>
> >>>>>    == Known Risks ==
> >>>>>
> >>>>>    === Orphaned Products ===
> >>>>>
> >>>>>    The current core developers are the leading contributors in the
> >> space of
> >>>>>    DFDL and wish to see it flourish. Though there is some risk that
> the
> >>>>>    initial committers all come from the same company, a goal of
> >> entering
> >>>>>    into incubation is to grow the development community to minimize
> the
> >>>>>    risk of reliance on a single company.
> >>>>>
> >>>>>    === Inexperience with Open Source ===
> >>>>>
> >>>>>    The Daffodil project began as an open source project and has
> >> continued
> >>>>>    that model throughout development. This includes public bug
> >> tracking,
> >>>>>    git revision control, automated builds and tests, and a public
> wiki
> >> for
> >>>>>    documentation.
> >>>>>
> >>>>>    Additionally, the current core developers and initial committers
> all
> >>>>>    work for a company that relies on, believes in, promotes, and has
> >> led or
> >>>>>    contributed to many open source software projects, including
> SELinux
> >>>>>    Userspace, OpenSCAP, CLIP, refpolicy, setools, RPM, and others. As
> >> such,
> >>>>>    there is low risk related to inexperience with open source
> software
> >> and
> >>>>>    processes.
> >>>>>
> >>>>>    === Homogeneous Developers ===
> >>>>>
> >>>>>    The proposed initial committers come from a single entity, though
> >> we are
> >>>>>    committed to growing the Daffodil development community to
> include a
> >>>>>    broad group of additional committers from a wide array of
> >> industries.
> >>>>>
> >>>>>    === Reliance on Salaried Developers ===
> >>>>>
> >>>>>    The proposed initial committers are paid by their employer to
> >> contribute
> >>>>>    to the Daffodil project. We expect that Daffodil development will
> >>>>>    continue with salaried developers, and are committed to growing
> the
> >>>>>    community to include non-salaried developers as well.
> >>>>>
> >>>>>    === Relationship with other Apache Projects ===
> >>>>>
> >>>>>    As mentioned in the Alignment section, Daffodil fills a clear gap
> in
> >>>>>    numerous other ASF projects that consume and manage large amounts
> >> of data.
> >>>>>
> >>>>>    As a specific example, Daffodil developers have created a Daffodil
> >>>>>    Apache !NiFi Processor, currently in use in data transfer
> solutions,
> >>>>>    which allows one to ingest non-native data into an Apache !NiFi
> >> pipeline
> >>>>>    as XML or JSON. This processor was well received by the Apache
> !NiFi
> >>>>>    developers, with positive comments about the concise API and how
> it
> >>>>>    could handle non-native data. Daffodil developers have also
> >> successfully
> >>>>>    prototyped integration with Apache Spark. We believe Daffodil
> could
> >>>>>    provide a strong benefit to many other ASF projects that handle
> >> fixed
> >>>>>    format data. We anticipate working closely with such ASF projects
> to
> >>>>>    include Daffodil where applicable to increase their ability to
> >> support
> >>>>>    new data formats with minimal effort.
> >>>>>
> >>>>>    Daffodil also depends on existing ASF projects, including Apache
> >> Commons
> >>>>>    and Apache Xerces.
> >>>>>
> >>>>>    === An Excessive Fascination with the Apache Brand ===
> >>>>>
> >>>>>    Although the Apache brand may certainly help to attract more
> >>>>>    contributors, publicity is not the reason for this proposal. We
> >> believe
> >>>>>    Daffodil could provide a great benefit to the ASF and the numerous
> >> data
> >>>>>    focused projects that comprise it, as described in the Rationale
> and
> >>>>>    Alignment sections. We hope to build a strong and vibrant
> community
> >>>>>    built around The Apache Way, and not dependent on a single
> company.
> >>>>>
> >>>>>    === Documentation ===
> >>>>>
> >>>>>    Daffodil documentation can be found at:
> >>>>>
> >>>>>    *
> >>>>>    https://opensource.ncsa.illinois.edu/confluence/
> >>>>>    display/DFDL/Daffodil%3A+Open+Source+DFDL
> >>>>>
> >>>>>    Information about DFDL can be found at:
> >>>>>
> >>>>>    * https://www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
> >>>>>    *
> >>>>>    https://www.ibm.com/support/knowledgecenter/en/SSMKHH_9.0.
> >>>>>    0/com.ibm.etools.mft.doc/df20060_.htm
> >>>>>
> >>>>>    Public examples of DFDL Schemas can be found at:
> >>>>>
> >>>>>    * https://github.com/DFDLSchemas
> >>>>>
> >>>>>    == Initial Source ==
> >>>>>
> >>>>>    The Daffodil git repo goes back to mid-2011 with approximately 20
> >>>>>    different contributors and feedback from many users and
> developers.
> >> The
> >>>>>    core codebase is written in Scala and includes both a Scala and
> Java
> >>>>>    API, along with Javadocs and Scaladocs for API usage. The initial
> >> code
> >>>>>    will come from the git repository currently hosted by NCSA at the
> >>>>>    University of Illinois :
> >>>>>
> >>>>>    https://opensource.ncsa.illinois.edu/bitbucket/
> >>>>>    projects/DFDL/repos/daffodil/
> >>>>>
> >>>>>    == Source and Intellectual Property Submission ==
> >>>>>
> >>>>>    The complete Daffodil code is licensed under the University of
> >>>>>    Illinois/NCSA Open Source License. Much of the current codebase
> has
> >> been
> >>>>>    developed by Tresys Technology, who is open to relicensing the
> code
> >> to
> >>>>>    the Apache License version 2.0 and donate the source to the ASF.
> >>>>>    Contacts at NCSA are also open to relicensing their contributions
> to
> >>>>>    Apache v2. We plan to contact the other contributors and ask for
> >>>>>    permission to relicense and donate their contributed code. For
> those
> >>>>>    that decline or we cannot contact, their code will be removed or
> >>>>>    replaced. We will work closely with Apache Legal to ensure all
> >> issues
> >>>>>    related to relicensing are acceptable.
> >>>>>
> >>>>>    == External Dependencies ==
> >>>>>
> >>>>>    We believe all current dependencies are compatible with the ASF
> >>>>>    guidelines. Our dependency licenses come from the following
> license
> >>>>>    styles: Apache v2, BSD, MIT, and ICU. The list of current Daffodil
> >>>>>    dependencies and their licenses are documented here:
> >>>>>
> >>>>>    https://opensource.ncsa.illinois.edu/confluence/
> >>>>>    display/DFDL/Dependencies+and+Licenses
> >>>>>
> >>>>>    == Cryptography ==
> >>>>>
> >>>>>    None
> >>>>>
> >>>>>    == Required Resources ==
> >>>>>
> >>>>>    === Mailing Lists ===
> >>>>>
> >>>>>    * commits@daffodil.incubator.apache.org
> >>>>>    * dev@daffodil.incubator.apache.org
> >>>>>    * private@daffodil.incubator.apache.org
> >>>>>    * user@daffodil.incubator.apache.org
> >>>>>
> >>>>>    === Source Control ===
> >>>>>
> >>>>>    git://git.apache.org/incubator-daffodil.git
> >>>>>
> >>>>>    === Issue Tracking ===
> >>>>>
> >>>>>    JIRA Daffodil (DFDL)
> >>>>>
> >>>>>    === Initial Committers ===
> >>>>>
> >>>>>    * Beth Finnegan <efinnegan at tresys dot com>
> >>>>>    * Dave Thompson <dthompson at tresys dot com>
> >>>>>    * Josh Adams <jadams at tresys dot com>
> >>>>>    * Mike Beckerle <mbeckerle at tresys dot com>
> >>>>>    * Steve Lawrence <slawrence at tresys dot com>
> >>>>>    * Taylor Wise <twise at tresys dot com>
> >>>>>
> >>>>>    === Affiliations ===
> >>>>>
> >>>>>    * Beth Finnegan (Tresys Technology)
> >>>>>    * Dave Thompson (Tresys Technology)
> >>>>>    * Josh Adams (Tresys Technology)
> >>>>>    * Mike Beckerle (Tresys Technology)
> >>>>>    * Steve Lawrence (Tresys Technology)
> >>>>>    * Taylor Wise (Tresys Technology)
> >>>>>
> >>>>>    == Sponsors ==
> >>>>>
> >>>>>    === Champion ===
> >>>>>
> >>>>>    * TBD
> >>>>>
> >>>>>    === Nominated Mentors ===
> >>>>>
> >>>>>    * TBD
> >>>>>
> >>>>>    === Sponsoring Entity ===
> >>>>>
> >>>>>    We request the Apache Incubator to sponsor this project.
> >>>>>
> >>>>>
> >> ---------------------------------------------------------------------
> >>>>>    To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >>>>>    For additional commands, e-mail:
> general-help@incubator.apache.org
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >> ---------------------------------------------------------------------
> >>>>>    To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >> <ma...@incubator.apache.org>
> >>>>>    For additional commands, e-mail:
> general-help@incubator.apache.org
> >> <ma...@incubator.apache.org>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >>>>> For additional commands, e-mail: general-help@incubator.apache.org
> >>>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >>>> For additional commands, e-mail: general-help@incubator.apache.org
> >>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >> For additional commands, e-mail: general-help@incubator.apache.org
> >>
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

Re: [DISCUSS] Daffodil Incubation Proposal

Posted by Steve Lawrence <st...@gmail.com>.
Thanks John!

On 08/02/2017 03:23 PM, John D. Ament wrote:
> You can also count me in as a mentor.
> 
> John
> 
> On Wed, Aug 2, 2017 at 3:14 PM Steve Lawrence <st...@gmail.com>
> wrote:
> 
>> Understood. Thanks for the interest!
>>
>> - Steve
>>
>> On 08/02/2017 02:57 PM, Dave Fisher wrote:
>>> Hi Steve,
>>>
>>> It was not so much the lack of committers as it was the current
>> diversity. That is not a blocker for entry to Incubation.
>>>
>>> I am willing to be one of the Mentors. Once there are at least two more
>> we can push forward.
>>>
>>> Regards,
>>> Dave
>>>
>>>> On Aug 1, 2017, at 5:09 AM, Steve Lawrence <
>> stephen.d.lawrence@gmail.com> wrote:
>>>>
>>>> Discussions have died down, and I think the consensus from the responses
>>>> is that the issues are 1) the lack of committers and 2) the lack of a
>>>> champion and mentors. We hope to address #1 and grow the community as
>>>> part of incubation. Is anyone interested in being a champion or mentor
>>>> and help us with #2?
>>>>
>>>> Thanks,
>>>> - Steve
>>>>
>>>> On 07/26/2017 04:06 PM, Chris Mattmann wrote:
>>>>> This sounds like a very interesting project.
>>>>>
>>>>> I don’t have the time to mentor at the moment but I will keep a close
>> eye on it.
>>>>>
>>>>> Cheers,
>>>>> Chris Mattmann
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 7/25/17, 11:53 AM, "McHenry, Kenton Guadron" <mc...@illinois.edu>
>> wrote:
>>>>>
>>>>>    Hi Dave,
>>>>>
>>>>>    The developers that were at NCSA have moved on to other
>> organizations.  While we still leverage Daffodil and are very much
>> interested in seeing it move forward, development is currently done by the
>> Tresys team.  Agreed on the synergy with Tika.
>>>>>
>>>>>    Kenton McHenry, Ph.D.
>>>>>    Principal Research Scientist, Adjunct Assistant Professor of
>> Computer Science
>>>>>    Deputy Director of the Scientific Software & Applications Division
>>>>>    National Center for Supercomputing Applications, University of
>> Illinois at Urbana-Champaign
>>>>>
>>>>>    On Jul 24, 2017, at 1:55 PM, Dave Fisher <dave2wave@comcast.net
>> <ma...@comcast.net>> wrote:
>>>>>
>>>>>    Hi Kenton,
>>>>>
>>>>>    Is there any reason that you and others from the NCSA are not
>> Initial Committers? That would make this proposal stronger.
>>>>>
>>>>>    Regarding Apache Tika - it relies on other projects including
>> Apache POI and Apache PDFBox. They are pragmatic about what is used. If
>> Daffodil works to expand then I think that there would be good synergy
>> between the projects. I know as a POI PMC member that the POI community has
>> significantly benefited from the Tika community some of whom are from Mitre.
>>>>>
>>>>>    To date Tika has not emphasized structured data, although they do
>> extract content from Excel and OpenOffice.
>>>>>
>>>>>    I am intrigued.
>>>>>
>>>>>    Regards,
>>>>>    Dave
>>>>>
>>>>>    On Jul 24, 2017, at 10:55 AM, McHenry, Kenton Guadron <
>> mchenry@illinois.edu<ma...@illinois.edu>> wrote:
>>>>>
>>>>>    Yes, DFDL and its open source implementation Daffodil are more
>> about file formats and getting access to the entirety of a file's contents
>> in a consistent way through machine readable specifications.  The work has
>> implications in the area of digital preservation allowing one to preserve
>> these machine readable specifications rather than all the tools needed to
>> open/save a file in order to work with it.  Imagine someone developing
>> graphics software to work with 3D models and not having to worry about the
>> hundreds of formats out there for 3D meshes (whether there are tools for
>> opening the files and whether they can get access to those tools, whether
>> the spec is available and worrying about how complex that spec is to
>> implement, etc.), and simply building their code around the contents (e.g.
>> vertices, faces, etc.).  One could come up with similar scenarios for other
>> data types (documents, images, videos, audio, depth data, numeric data).
>> Ideally tools built supporting DFDL, could someday, support any format for
>> that type without the developer having to worry about the details of how
>> that data is represented within a file.
>>>>>
>>>>>    Kenton McHenry, Ph.D.
>>>>>    Principal Research Scientist, Adjunct Assistant Professor of
>> Computer Science
>>>>>    Deputy Director of the Scientific Software & Applications Division
>>>>>    National Center for Supercomputing Applications, University of
>> Illinois at Urbana-Champaign
>>>>>
>>>>>    On Jul 24, 2017, at 10:30 AM, Steve Lawrence <
>> stephen.d.lawrence@gmail.com<ma...@gmail.com><mailto:
>> stephen.d.lawrence@gmail.com>> wrote:
>>>>>
>>>>>    I'll preface this saying that I don't have a ton of experience with
>>>>>    Apache Tika. But based on my understanding, Tika and Daffodil do
>> have
>>>>>    somewhat similar goals, but reach them in different ways. For
>> example,
>>>>>    Tika requires that one writes /code/ to perform data extraction,
>> usually
>>>>>    relying on existing Java libraries to extract the desired metadata.
>> The
>>>>>    downside to this is that code can be buggy, and libraries might not
>> even
>>>>>    exist for formats of interest (especially common with legacy and
>>>>>    military data).
>>>>>
>>>>>    Daffodil, on the other hand, does not require one to write any code.
>>>>>    Instead, one writes a DFDL Schema (similar to XML Schema, with DFDL
>>>>>    annotations) that fully describes the data, which Daffodil then
>> uses to
>>>>>    convert the data to XML/JSON for extraction. So adding support for
>> a new
>>>>>    format means writing a new schema rather than new code. And less
>> code
>>>>>    generally means less bugs. Also, for secure systems that require
>>>>>    certification, generally speaking, it is easier to certify a schema
>> as
>>>>>    compared to code.
>>>>>
>>>>>    We certainly don't believe that Daffodil could replace Tika, but it
>> does
>>>>>    have the potential to add new functionality to Tika for formats
>> that do
>>>>>    not have existing libraries. One of our goals is to look into
>>>>>    integrating Daffodil support into tools like Tika. We'd love to hear
>>>>>    from Tika devs if this is something they'd be interested in.
>>>>>
>>>>>    I'll also add that whereas Tika tends to focus primarily on
>> metadata,
>>>>>    DFDL schemas usually describe an entire file format down to the
>> byte, so
>>>>>    one can extract more than just meta data, including text and binary
>>>>>    data. Further differentiating, Daffodil has support for serializing
>> data
>>>>>    (called unparse) from the XML/JSON representation, allowing one to
>>>>>    transform or filter data as well. We don't believe this feature is
>> all
>>>>>    that applicable to Tika, but may be useful to other technologies
>> such as
>>>>>    filtering or data fuzzing technologies.
>>>>>
>>>>>    - Steve
>>>>>
>>>>>
>>>>>    On 07/24/2017 10:59 AM, Mike Drob wrote:
>>>>>    What is the relationship between Daffodil and something like Apache
>> Tika's
>>>>>    extraction engine?
>>>>>
>>>>>    On Mon, Jul 24, 2017 at 9:53 AM, Steve Lawrence <
>>>>>    stephen.d.lawrence@gmail.com<mailto:stephen.d.lawrence@gmail.com
>>> <ma...@gmail.com>> wrote:
>>>>>
>>>>>    Dear Apache Incubator Community,
>>>>>
>>>>>    We would like to start a discussion around a proposal to bring
>> Daffodil
>>>>>    into the Apache Incubator. Daffodil is a implementation of the DFDL
>>>>>    specification used to convert between fixed format data and
>> XML/JSON.
>>>>>
>>>>>    The draft proposal can be found in the wiki at the following URL:
>>>>>
>>>>>    https://wiki.apache.org/incubator/DaffodilProposal
>>>>>
>>>>>    We do not yet have a champion or mentors, but it was recommended
>> that we
>>>>>    create a proposal and send it to this list to potentially find those
>>>>>    that might be interested. The text for the draft proposal is found
>>>>>    below. We look forward to your input.
>>>>>
>>>>>    Thanks,
>>>>>    -Steve
>>>>>
>>>>>
>>>>>    = Daffodil Proposal =
>>>>>
>>>>>    == Abstract ==
>>>>>
>>>>>    Daffodil is an implementation of the Data Format Description
>> Language
>>>>>    (DFDL) used to convert between fixed format data and XML/JSON.
>>>>>
>>>>>    == Proposal ==
>>>>>
>>>>>    The Data Format Description Language (DFDL) is a specification,
>>>>>    developed by the Open Grid Forum, capable of describing many data
>>>>>    formats, including both textual and binary, scientific and numeric,
>>>>>    legacy and modern, commercial record-oriented, and many industry and
>>>>>    military standards. It defines a language that is a subset of W3C
>> XML
>>>>>    schema to describe the logical format of the data, and annotations
>>>>>    within the schema to describe the physical representation.
>>>>>
>>>>>    Daffodil is an open source implementation of the DFDL specification
>> that
>>>>>    uses these DFDL schemas to parse fixed format data into an infoset,
>>>>>    which is most commonly represented as either XML or JSON. This
>> allows
>>>>>    the use of well-established XML or JSON technologies and libraries
>> to
>>>>>    consume, inspect, and manipulate fixed format data in existing
>>>>>    solutions. Daffodil is also capable of the reverse by serializing or
>>>>>    "unparsing" an XML or JSON infoset back to the original data format.
>>>>>
>>>>>    == Background ==
>>>>>
>>>>>    Many different software solutions need to consume and manage data,
>>>>>    including data directed routing, databases, data analysis, data
>>>>>    cleansing, data visualizing, and more. A key aspect of such
>> solutions is
>>>>>    the need to transform the data into an easily consumable format.
>>>>>    Usually, this means that for each unique data format, one develops a
>>>>>    tool that can read and extract the necessary information, often
>> leading
>>>>>    to ad-hoc and data-format-specific description systems. Such
>> systems are
>>>>>    often proprietary, not well tested, and incompatible, leading to
>> vendor
>>>>>    lock-in, flawed software, and increased training costs. DFDL is a
>> new
>>>>>    standard, with version 1.0 completed in October of 2016, that solves
>>>>>    these problems by defining an open standard to describe many
>> different
>>>>>    data formats and how to parse and unparse between the data and
>> XML/JSON.
>>>>>
>>>>>    Two closed source implementations of DFDL currently exist. The
>> first was
>>>>>    created by IBM and is now part of their IBM® Integration Bus
>> product.
>>>>>    The second was created by the European Space Agency, called DFDL4S
>> or
>>>>>    "DFDL for Space" targeted at the challenges of their satellite data
>>>>>    processing.
>>>>>
>>>>>    Around 2005, Pacific Northwest National Lab created Defuddle, built
>> as
>>>>>    an open source implementation and proof of concept of the draft DFDL
>>>>>    specification and a test bed to feed new concepts into specification
>>>>>    development. Primary development of Defuddle was eventually taken
>> over
>>>>>    by the National Center for Supercomputing Applications (NCSA).
>> However,
>>>>>    due to evolution of the DFDL specification and architectural and
>>>>>    performance issues with Defuddle, around 2009, NCSA restarted the
>>>>>    project with the new name of Daffodil, with a goal of implementing
>> the
>>>>>    complete DFDL specification. Daffodil development continued at NCSA
>>>>>    until around 2012, at which point development slowed due to budget
>>>>>    limitations. Shortly thereafter, primary development was picked up
>> by
>>>>>    Tresys Technology where it continues today, with contributions from
>>>>>    other entities such as the Navy Research Lab, the Air Force Research
>>>>>    Lab, MITRE, and Booz Allen Hamilton. In February of 2015, Daffodil
>>>>>    version 1.0.0 was released, including support for the DFDL features
>>>>>    needed to parse many common file formats. Daffodil version 2.0.0 is
>>>>>    expected to be released in August of 2017, which will include
>> unparse
>>>>>    support with one-to-one parsing feature parity.
>>>>>
>>>>>    Entities including IBM, MITRE, NATO NCI Agency, Northrop-Grumman,
>> Quark
>>>>>    Security, Raytheon, and Tresys Technology have developed DFDL
>> schemas
>>>>>    for many data formats from varying technology domains, including
>> PNG,
>>>>>    GIF, BMP, PCAP, HL7, EDIFACT, NACHA, vCard, iCalendar, and
>> MIL-STD-2045,
>>>>>    many of which are publicly available on the DFDL Schemas github.
>> There
>>>>>    are also a number of military-application data formats, the
>>>>>    specifications of which are not public, which have historically been
>>>>>    very difficult and expensive to process, and for which DFDL schemas
>> have
>>>>>    been created or are actively in development; these include
>>>>>    MIL-STD-6040/USMTF ATO, MIL-STD-6017/VMF, MIL-STD-6016/NATO STANAG
>> 5516
>>>>>    (aka "Link16").
>>>>>
>>>>>    == Rationale ==
>>>>>
>>>>>    Numerous software solutions exist that consume, inspect, analyze,
>> and
>>>>>    transform data, many of which can be found in the Apache Software
>>>>>    Foundation (ASF). In order for tools like these to consume new
>> types of
>>>>>    data, custom extensions are usually required, often with high
>>>>>    development and testing costs. Daffodil fills a clear gap in many of
>>>>>    these solutions, providing a simple and low cost way to transform
>> data
>>>>>    to XML or JSON, which many of these tools natively support already.
>> With
>>>>>    the upcoming 2.0.0 release, the Daffodil project will have achieved
>> a
>>>>>    level of functionality in both parse and unparse that, when
>> integrated
>>>>>    into existing solutions, could provide for a new method to quickly
>>>>>    enable support for new data formats.
>>>>>
>>>>>    == Initial Goals ==
>>>>>
>>>>>    * Relicense the existing code from the University of Illinois/NCSA
>> Open
>>>>>    Source License to the Apache License version 2.0, working with
>> Apache
>>>>>    Legal to ensure correctness, and with Daffodil contributors to get
>>>>>    their permission.
>>>>>    * Move the existing codebase, documentation, bugs, and mailing
>> lists to
>>>>>    the Apache hosted infrastructure
>>>>>    * Establish a formal release process and schedule, allowing for
>>>>>    dependable release cycles in a manner consistent with the Apache
>>>>>    development process.
>>>>>    * Build relationships with ASF projects to add Daffodil support
>> where
>>>>>    appropriate
>>>>>    * Grow the community to establish a diversity of background and
>> expertise.
>>>>>
>>>>>    == Current Status ==
>>>>>
>>>>>    === Meritocracy ===
>>>>>
>>>>>    All initial committers are familiar with the principles of
>> meritocracy.
>>>>>    The Daffodil project has followed the model of meritocracy in the
>> past,
>>>>>    providing multiple outside entities commit access based on the
>> quality
>>>>>    of their contributions. In order to grow the Daffodil user base and
>>>>>    development community, we are dedicated to continuing to operate
>>>>>    Daffodil as a meritocracy.
>>>>>
>>>>>    A key ingredient in a meritocracy of developers is open group code
>>>>>    review. The Daffodil project has operated in this mode throughout
>> its
>>>>>    existence and this provides a forum to improve the code, verify code
>>>>>    quality, and educate new developers on the code base.
>>>>>
>>>>>    === Community ===
>>>>>
>>>>>    Daffodil has a small community of users and developers. Although
>> primary
>>>>>    Daffodil development is done by Tresys Technology, a handful of
>> other
>>>>>    contributions have come from other entities including the Navy
>> Research
>>>>>    Lab, the Air Force Research Lab, MITRE, and Booz Allen Hamilton. In
>>>>>    addition to developers, multiple users of Daffodil have created DFDL
>>>>>    schemas, including entities such as MITRE, IBM, Raytheon, Quark
>>>>>    Security, and Tresys Technology. The DFDL Schemas github community
>> has
>>>>>    been created as a place for DFDL schemas to be published. The
>> Daffodil
>>>>>    project also makes use of mailing lists, !HipChat, and Confluence
>>>>>    Questions to build a community of users and system for support.
>>>>>
>>>>>    === Core Developers ===
>>>>>
>>>>>    The core developers of Daffodil are employed by Tresys Technology.
>> We
>>>>>    will work to grow the community among a more diverse set of
>> developers
>>>>>    and industries.
>>>>>
>>>>>    === Alignment ===
>>>>>
>>>>>    Daffodil was created as an open source project with a philosophy
>>>>>    consistent with The Apache Way. A strong belief in meritocracy,
>>>>>    community involvement in decisions, openness, and ensuring a high
>> level
>>>>>    of quality in code, documentation, and testing are some of our
>> shared
>>>>>    core beliefs.
>>>>>
>>>>>    Further, as mentioned in the Rationale section, Daffodil fills a gap
>>>>>    that exists in many ASF projects, including !NiFi, Spark, Storm,
>> Hadoop,
>>>>>    Tika, and others. In order for tools like these to consume new
>> types of
>>>>>    data, custom extensions are usually required. Rather than create
>> such
>>>>>    extensions, Daffodil provides an easy and standards-compliant way to
>>>>>    transform data to XML or JSON, which many of these tools already
>>>>>    natively support.
>>>>>
>>>>>    == Known Risks ==
>>>>>
>>>>>    === Orphaned Products ===
>>>>>
>>>>>    The current core developers are the leading contributors in the
>> space of
>>>>>    DFDL and wish to see it flourish. Though there is some risk that the
>>>>>    initial committers all come from the same company, a goal of
>> entering
>>>>>    into incubation is to grow the development community to minimize the
>>>>>    risk of reliance on a single company.
>>>>>
>>>>>    === Inexperience with Open Source ===
>>>>>
>>>>>    The Daffodil project began as an open source project and has
>> continued
>>>>>    that model throughout development. This includes public bug
>> tracking,
>>>>>    git revision control, automated builds and tests, and a public wiki
>> for
>>>>>    documentation.
>>>>>
>>>>>    Additionally, the current core developers and initial committers all
>>>>>    work for a company that relies on, believes in, promotes, and has
>> led or
>>>>>    contributed to many open source software projects, including SELinux
>>>>>    Userspace, OpenSCAP, CLIP, refpolicy, setools, RPM, and others. As
>> such,
>>>>>    there is low risk related to inexperience with open source software
>> and
>>>>>    processes.
>>>>>
>>>>>    === Homogeneous Developers ===
>>>>>
>>>>>    The proposed initial committers come from a single entity, though
>> we are
>>>>>    committed to growing the Daffodil development community to include a
>>>>>    broad group of additional committers from a wide array of
>> industries.
>>>>>
>>>>>    === Reliance on Salaried Developers ===
>>>>>
>>>>>    The proposed initial committers are paid by their employer to
>> contribute
>>>>>    to the Daffodil project. We expect that Daffodil development will
>>>>>    continue with salaried developers, and are committed to growing the
>>>>>    community to include non-salaried developers as well.
>>>>>
>>>>>    === Relationship with other Apache Projects ===
>>>>>
>>>>>    As mentioned in the Alignment section, Daffodil fills a clear gap in
>>>>>    numerous other ASF projects that consume and manage large amounts
>> of data.
>>>>>
>>>>>    As a specific example, Daffodil developers have created a Daffodil
>>>>>    Apache !NiFi Processor, currently in use in data transfer solutions,
>>>>>    which allows one to ingest non-native data into an Apache !NiFi
>> pipeline
>>>>>    as XML or JSON. This processor was well received by the Apache !NiFi
>>>>>    developers, with positive comments about the concise API and how it
>>>>>    could handle non-native data. Daffodil developers have also
>> successfully
>>>>>    prototyped integration with Apache Spark. We believe Daffodil could
>>>>>    provide a strong benefit to many other ASF projects that handle
>> fixed
>>>>>    format data. We anticipate working closely with such ASF projects to
>>>>>    include Daffodil where applicable to increase their ability to
>> support
>>>>>    new data formats with minimal effort.
>>>>>
>>>>>    Daffodil also depends on existing ASF projects, including Apache
>> Commons
>>>>>    and Apache Xerces.
>>>>>
>>>>>    === An Excessive Fascination with the Apache Brand ===
>>>>>
>>>>>    Although the Apache brand may certainly help to attract more
>>>>>    contributors, publicity is not the reason for this proposal. We
>> believe
>>>>>    Daffodil could provide a great benefit to the ASF and the numerous
>> data
>>>>>    focused projects that comprise it, as described in the Rationale and
>>>>>    Alignment sections. We hope to build a strong and vibrant community
>>>>>    built around The Apache Way, and not dependent on a single company.
>>>>>
>>>>>    === Documentation ===
>>>>>
>>>>>    Daffodil documentation can be found at:
>>>>>
>>>>>    *
>>>>>    https://opensource.ncsa.illinois.edu/confluence/
>>>>>    display/DFDL/Daffodil%3A+Open+Source+DFDL
>>>>>
>>>>>    Information about DFDL can be found at:
>>>>>
>>>>>    * https://www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
>>>>>    *
>>>>>    https://www.ibm.com/support/knowledgecenter/en/SSMKHH_9.0.
>>>>>    0/com.ibm.etools.mft.doc/df20060_.htm
>>>>>
>>>>>    Public examples of DFDL Schemas can be found at:
>>>>>
>>>>>    * https://github.com/DFDLSchemas
>>>>>
>>>>>    == Initial Source ==
>>>>>
>>>>>    The Daffodil git repo goes back to mid-2011 with approximately 20
>>>>>    different contributors and feedback from many users and developers.
>> The
>>>>>    core codebase is written in Scala and includes both a Scala and Java
>>>>>    API, along with Javadocs and Scaladocs for API usage. The initial
>> code
>>>>>    will come from the git repository currently hosted by NCSA at the
>>>>>    University of Illinois :
>>>>>
>>>>>    https://opensource.ncsa.illinois.edu/bitbucket/
>>>>>    projects/DFDL/repos/daffodil/
>>>>>
>>>>>    == Source and Intellectual Property Submission ==
>>>>>
>>>>>    The complete Daffodil code is licensed under the University of
>>>>>    Illinois/NCSA Open Source License. Much of the current codebase has
>> been
>>>>>    developed by Tresys Technology, who is open to relicensing the code
>> to
>>>>>    the Apache License version 2.0 and donate the source to the ASF.
>>>>>    Contacts at NCSA are also open to relicensing their contributions to
>>>>>    Apache v2. We plan to contact the other contributors and ask for
>>>>>    permission to relicense and donate their contributed code. For those
>>>>>    that decline or we cannot contact, their code will be removed or
>>>>>    replaced. We will work closely with Apache Legal to ensure all
>> issues
>>>>>    related to relicensing are acceptable.
>>>>>
>>>>>    == External Dependencies ==
>>>>>
>>>>>    We believe all current dependencies are compatible with the ASF
>>>>>    guidelines. Our dependency licenses come from the following license
>>>>>    styles: Apache v2, BSD, MIT, and ICU. The list of current Daffodil
>>>>>    dependencies and their licenses are documented here:
>>>>>
>>>>>    https://opensource.ncsa.illinois.edu/confluence/
>>>>>    display/DFDL/Dependencies+and+Licenses
>>>>>
>>>>>    == Cryptography ==
>>>>>
>>>>>    None
>>>>>
>>>>>    == Required Resources ==
>>>>>
>>>>>    === Mailing Lists ===
>>>>>
>>>>>    * commits@daffodil.incubator.apache.org
>>>>>    * dev@daffodil.incubator.apache.org
>>>>>    * private@daffodil.incubator.apache.org
>>>>>    * user@daffodil.incubator.apache.org
>>>>>
>>>>>    === Source Control ===
>>>>>
>>>>>    git://git.apache.org/incubator-daffodil.git
>>>>>
>>>>>    === Issue Tracking ===
>>>>>
>>>>>    JIRA Daffodil (DFDL)
>>>>>
>>>>>    === Initial Committers ===
>>>>>
>>>>>    * Beth Finnegan <efinnegan at tresys dot com>
>>>>>    * Dave Thompson <dthompson at tresys dot com>
>>>>>    * Josh Adams <jadams at tresys dot com>
>>>>>    * Mike Beckerle <mbeckerle at tresys dot com>
>>>>>    * Steve Lawrence <slawrence at tresys dot com>
>>>>>    * Taylor Wise <twise at tresys dot com>
>>>>>
>>>>>    === Affiliations ===
>>>>>
>>>>>    * Beth Finnegan (Tresys Technology)
>>>>>    * Dave Thompson (Tresys Technology)
>>>>>    * Josh Adams (Tresys Technology)
>>>>>    * Mike Beckerle (Tresys Technology)
>>>>>    * Steve Lawrence (Tresys Technology)
>>>>>    * Taylor Wise (Tresys Technology)
>>>>>
>>>>>    == Sponsors ==
>>>>>
>>>>>    === Champion ===
>>>>>
>>>>>    * TBD
>>>>>
>>>>>    === Nominated Mentors ===
>>>>>
>>>>>    * TBD
>>>>>
>>>>>    === Sponsoring Entity ===
>>>>>
>>>>>    We request the Apache Incubator to sponsor this project.
>>>>>
>>>>>
>> ---------------------------------------------------------------------
>>>>>    To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>>>>    For additional commands, e-mail: general-help@incubator.apache.org
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>> ---------------------------------------------------------------------
>>>>>    To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> <ma...@incubator.apache.org>
>>>>>    For additional commands, e-mail: general-help@incubator.apache.org
>> <ma...@incubator.apache.org>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] Daffodil Incubation Proposal

Posted by "John D. Ament" <jo...@apache.org>.
You can also count me in as a mentor.

John

On Wed, Aug 2, 2017 at 3:14 PM Steve Lawrence <st...@gmail.com>
wrote:

> Understood. Thanks for the interest!
>
> - Steve
>
> On 08/02/2017 02:57 PM, Dave Fisher wrote:
> > Hi Steve,
> >
> > It was not so much the lack of committers as it was the current
> diversity. That is not a blocker for entry to Incubation.
> >
> > I am willing to be one of the Mentors. Once there are at least two more
> we can push forward.
> >
> > Regards,
> > Dave
> >
> >> On Aug 1, 2017, at 5:09 AM, Steve Lawrence <
> stephen.d.lawrence@gmail.com> wrote:
> >>
> >> Discussions have died down, and I think the consensus from the responses
> >> is that the issues are 1) the lack of committers and 2) the lack of a
> >> champion and mentors. We hope to address #1 and grow the community as
> >> part of incubation. Is anyone interested in being a champion or mentor
> >> and help us with #2?
> >>
> >> Thanks,
> >> - Steve
> >>
> >> On 07/26/2017 04:06 PM, Chris Mattmann wrote:
> >>> This sounds like a very interesting project.
> >>>
> >>> I don’t have the time to mentor at the moment but I will keep a close
> eye on it.
> >>>
> >>> Cheers,
> >>> Chris Mattmann
> >>>
> >>>
> >>>
> >>>
> >>> On 7/25/17, 11:53 AM, "McHenry, Kenton Guadron" <mc...@illinois.edu>
> wrote:
> >>>
> >>>    Hi Dave,
> >>>
> >>>    The developers that were at NCSA have moved on to other
> organizations.  While we still leverage Daffodil and are very much
> interested in seeing it move forward, development is currently done by the
> Tresys team.  Agreed on the synergy with Tika.
> >>>
> >>>    Kenton McHenry, Ph.D.
> >>>    Principal Research Scientist, Adjunct Assistant Professor of
> Computer Science
> >>>    Deputy Director of the Scientific Software & Applications Division
> >>>    National Center for Supercomputing Applications, University of
> Illinois at Urbana-Champaign
> >>>
> >>>    On Jul 24, 2017, at 1:55 PM, Dave Fisher <dave2wave@comcast.net
> <ma...@comcast.net>> wrote:
> >>>
> >>>    Hi Kenton,
> >>>
> >>>    Is there any reason that you and others from the NCSA are not
> Initial Committers? That would make this proposal stronger.
> >>>
> >>>    Regarding Apache Tika - it relies on other projects including
> Apache POI and Apache PDFBox. They are pragmatic about what is used. If
> Daffodil works to expand then I think that there would be good synergy
> between the projects. I know as a POI PMC member that the POI community has
> significantly benefited from the Tika community some of whom are from Mitre.
> >>>
> >>>    To date Tika has not emphasized structured data, although they do
> extract content from Excel and OpenOffice.
> >>>
> >>>    I am intrigued.
> >>>
> >>>    Regards,
> >>>    Dave
> >>>
> >>>    On Jul 24, 2017, at 10:55 AM, McHenry, Kenton Guadron <
> mchenry@illinois.edu<ma...@illinois.edu>> wrote:
> >>>
> >>>    Yes, DFDL and its open source implementation Daffodil are more
> about file formats and getting access to the entirety of a file's contents
> in a consistent way through machine readable specifications.  The work has
> implications in the area of digital preservation allowing one to preserve
> these machine readable specifications rather than all the tools needed to
> open/save a file in order to work with it.  Imagine someone developing
> graphics software to work with 3D models and not having to worry about the
> hundreds of formats out there for 3D meshes (whether there are tools for
> opening the files and whether they can get access to those tools, whether
> the spec is available and worrying about how complex that spec is to
> implement, etc.), and simply building their code around the contents (e.g.
> vertices, faces, etc.).  One could come up with similar scenarios for other
> data types (documents, images, videos, audio, depth data, numeric data).
> Ideally tools built supporting DFDL, could someday, support any format for
> that type without the developer having to worry about the details of how
> that data is represented within a file.
> >>>
> >>>    Kenton McHenry, Ph.D.
> >>>    Principal Research Scientist, Adjunct Assistant Professor of
> Computer Science
> >>>    Deputy Director of the Scientific Software & Applications Division
> >>>    National Center for Supercomputing Applications, University of
> Illinois at Urbana-Champaign
> >>>
> >>>    On Jul 24, 2017, at 10:30 AM, Steve Lawrence <
> stephen.d.lawrence@gmail.com<ma...@gmail.com><mailto:
> stephen.d.lawrence@gmail.com>> wrote:
> >>>
> >>>    I'll preface this saying that I don't have a ton of experience with
> >>>    Apache Tika. But based on my understanding, Tika and Daffodil do
> have
> >>>    somewhat similar goals, but reach them in different ways. For
> example,
> >>>    Tika requires that one writes /code/ to perform data extraction,
> usually
> >>>    relying on existing Java libraries to extract the desired metadata.
> The
> >>>    downside to this is that code can be buggy, and libraries might not
> even
> >>>    exist for formats of interest (especially common with legacy and
> >>>    military data).
> >>>
> >>>    Daffodil, on the other hand, does not require one to write any code.
> >>>    Instead, one writes a DFDL Schema (similar to XML Schema, with DFDL
> >>>    annotations) that fully describes the data, which Daffodil then
> uses to
> >>>    convert the data to XML/JSON for extraction. So adding support for
> a new
> >>>    format means writing a new schema rather than new code. And less
> code
> >>>    generally means less bugs. Also, for secure systems that require
> >>>    certification, generally speaking, it is easier to certify a schema
> as
> >>>    compared to code.
> >>>
> >>>    We certainly don't believe that Daffodil could replace Tika, but it
> does
> >>>    have the potential to add new functionality to Tika for formats
> that do
> >>>    not have existing libraries. One of our goals is to look into
> >>>    integrating Daffodil support into tools like Tika. We'd love to hear
> >>>    from Tika devs if this is something they'd be interested in.
> >>>
> >>>    I'll also add that whereas Tika tends to focus primarily on
> metadata,
> >>>    DFDL schemas usually describe an entire file format down to the
> byte, so
> >>>    one can extract more than just meta data, including text and binary
> >>>    data. Further differentiating, Daffodil has support for serializing
> data
> >>>    (called unparse) from the XML/JSON representation, allowing one to
> >>>    transform or filter data as well. We don't believe this feature is
> all
> >>>    that applicable to Tika, but may be useful to other technologies
> such as
> >>>    filtering or data fuzzing technologies.
> >>>
> >>>    - Steve
> >>>
> >>>
> >>>    On 07/24/2017 10:59 AM, Mike Drob wrote:
> >>>    What is the relationship between Daffodil and something like Apache
> Tika's
> >>>    extraction engine?
> >>>
> >>>    On Mon, Jul 24, 2017 at 9:53 AM, Steve Lawrence <
> >>>    stephen.d.lawrence@gmail.com<mailto:stephen.d.lawrence@gmail.com
> ><ma...@gmail.com>> wrote:
> >>>
> >>>    Dear Apache Incubator Community,
> >>>
> >>>    We would like to start a discussion around a proposal to bring
> Daffodil
> >>>    into the Apache Incubator. Daffodil is a implementation of the DFDL
> >>>    specification used to convert between fixed format data and
> XML/JSON.
> >>>
> >>>    The draft proposal can be found in the wiki at the following URL:
> >>>
> >>>    https://wiki.apache.org/incubator/DaffodilProposal
> >>>
> >>>    We do not yet have a champion or mentors, but it was recommended
> that we
> >>>    create a proposal and send it to this list to potentially find those
> >>>    that might be interested. The text for the draft proposal is found
> >>>    below. We look forward to your input.
> >>>
> >>>    Thanks,
> >>>    -Steve
> >>>
> >>>
> >>>    = Daffodil Proposal =
> >>>
> >>>    == Abstract ==
> >>>
> >>>    Daffodil is an implementation of the Data Format Description
> Language
> >>>    (DFDL) used to convert between fixed format data and XML/JSON.
> >>>
> >>>    == Proposal ==
> >>>
> >>>    The Data Format Description Language (DFDL) is a specification,
> >>>    developed by the Open Grid Forum, capable of describing many data
> >>>    formats, including both textual and binary, scientific and numeric,
> >>>    legacy and modern, commercial record-oriented, and many industry and
> >>>    military standards. It defines a language that is a subset of W3C
> XML
> >>>    schema to describe the logical format of the data, and annotations
> >>>    within the schema to describe the physical representation.
> >>>
> >>>    Daffodil is an open source implementation of the DFDL specification
> that
> >>>    uses these DFDL schemas to parse fixed format data into an infoset,
> >>>    which is most commonly represented as either XML or JSON. This
> allows
> >>>    the use of well-established XML or JSON technologies and libraries
> to
> >>>    consume, inspect, and manipulate fixed format data in existing
> >>>    solutions. Daffodil is also capable of the reverse by serializing or
> >>>    "unparsing" an XML or JSON infoset back to the original data format.
> >>>
> >>>    == Background ==
> >>>
> >>>    Many different software solutions need to consume and manage data,
> >>>    including data directed routing, databases, data analysis, data
> >>>    cleansing, data visualizing, and more. A key aspect of such
> solutions is
> >>>    the need to transform the data into an easily consumable format.
> >>>    Usually, this means that for each unique data format, one develops a
> >>>    tool that can read and extract the necessary information, often
> leading
> >>>    to ad-hoc and data-format-specific description systems. Such
> systems are
> >>>    often proprietary, not well tested, and incompatible, leading to
> vendor
> >>>    lock-in, flawed software, and increased training costs. DFDL is a
> new
> >>>    standard, with version 1.0 completed in October of 2016, that solves
> >>>    these problems by defining an open standard to describe many
> different
> >>>    data formats and how to parse and unparse between the data and
> XML/JSON.
> >>>
> >>>    Two closed source implementations of DFDL currently exist. The
> first was
> >>>    created by IBM and is now part of their IBM® Integration Bus
> product.
> >>>    The second was created by the European Space Agency, called DFDL4S
> or
> >>>    "DFDL for Space" targeted at the challenges of their satellite data
> >>>    processing.
> >>>
> >>>    Around 2005, Pacific Northwest National Lab created Defuddle, built
> as
> >>>    an open source implementation and proof of concept of the draft DFDL
> >>>    specification and a test bed to feed new concepts into specification
> >>>    development. Primary development of Defuddle was eventually taken
> over
> >>>    by the National Center for Supercomputing Applications (NCSA).
> However,
> >>>    due to evolution of the DFDL specification and architectural and
> >>>    performance issues with Defuddle, around 2009, NCSA restarted the
> >>>    project with the new name of Daffodil, with a goal of implementing
> the
> >>>    complete DFDL specification. Daffodil development continued at NCSA
> >>>    until around 2012, at which point development slowed due to budget
> >>>    limitations. Shortly thereafter, primary development was picked up
> by
> >>>    Tresys Technology where it continues today, with contributions from
> >>>    other entities such as the Navy Research Lab, the Air Force Research
> >>>    Lab, MITRE, and Booz Allen Hamilton. In February of 2015, Daffodil
> >>>    version 1.0.0 was released, including support for the DFDL features
> >>>    needed to parse many common file formats. Daffodil version 2.0.0 is
> >>>    expected to be released in August of 2017, which will include
> unparse
> >>>    support with one-to-one parsing feature parity.
> >>>
> >>>    Entities including IBM, MITRE, NATO NCI Agency, Northrop-Grumman,
> Quark
> >>>    Security, Raytheon, and Tresys Technology have developed DFDL
> schemas
> >>>    for many data formats from varying technology domains, including
> PNG,
> >>>    GIF, BMP, PCAP, HL7, EDIFACT, NACHA, vCard, iCalendar, and
> MIL-STD-2045,
> >>>    many of which are publicly available on the DFDL Schemas github.
> There
> >>>    are also a number of military-application data formats, the
> >>>    specifications of which are not public, which have historically been
> >>>    very difficult and expensive to process, and for which DFDL schemas
> have
> >>>    been created or are actively in development; these include
> >>>    MIL-STD-6040/USMTF ATO, MIL-STD-6017/VMF, MIL-STD-6016/NATO STANAG
> 5516
> >>>    (aka "Link16").
> >>>
> >>>    == Rationale ==
> >>>
> >>>    Numerous software solutions exist that consume, inspect, analyze,
> and
> >>>    transform data, many of which can be found in the Apache Software
> >>>    Foundation (ASF). In order for tools like these to consume new
> types of
> >>>    data, custom extensions are usually required, often with high
> >>>    development and testing costs. Daffodil fills a clear gap in many of
> >>>    these solutions, providing a simple and low cost way to transform
> data
> >>>    to XML or JSON, which many of these tools natively support already.
> With
> >>>    the upcoming 2.0.0 release, the Daffodil project will have achieved
> a
> >>>    level of functionality in both parse and unparse that, when
> integrated
> >>>    into existing solutions, could provide for a new method to quickly
> >>>    enable support for new data formats.
> >>>
> >>>    == Initial Goals ==
> >>>
> >>>    * Relicense the existing code from the University of Illinois/NCSA
> Open
> >>>    Source License to the Apache License version 2.0, working with
> Apache
> >>>    Legal to ensure correctness, and with Daffodil contributors to get
> >>>    their permission.
> >>>    * Move the existing codebase, documentation, bugs, and mailing
> lists to
> >>>    the Apache hosted infrastructure
> >>>    * Establish a formal release process and schedule, allowing for
> >>>    dependable release cycles in a manner consistent with the Apache
> >>>    development process.
> >>>    * Build relationships with ASF projects to add Daffodil support
> where
> >>>    appropriate
> >>>    * Grow the community to establish a diversity of background and
> expertise.
> >>>
> >>>    == Current Status ==
> >>>
> >>>    === Meritocracy ===
> >>>
> >>>    All initial committers are familiar with the principles of
> meritocracy.
> >>>    The Daffodil project has followed the model of meritocracy in the
> past,
> >>>    providing multiple outside entities commit access based on the
> quality
> >>>    of their contributions. In order to grow the Daffodil user base and
> >>>    development community, we are dedicated to continuing to operate
> >>>    Daffodil as a meritocracy.
> >>>
> >>>    A key ingredient in a meritocracy of developers is open group code
> >>>    review. The Daffodil project has operated in this mode throughout
> its
> >>>    existence and this provides a forum to improve the code, verify code
> >>>    quality, and educate new developers on the code base.
> >>>
> >>>    === Community ===
> >>>
> >>>    Daffodil has a small community of users and developers. Although
> primary
> >>>    Daffodil development is done by Tresys Technology, a handful of
> other
> >>>    contributions have come from other entities including the Navy
> Research
> >>>    Lab, the Air Force Research Lab, MITRE, and Booz Allen Hamilton. In
> >>>    addition to developers, multiple users of Daffodil have created DFDL
> >>>    schemas, including entities such as MITRE, IBM, Raytheon, Quark
> >>>    Security, and Tresys Technology. The DFDL Schemas github community
> has
> >>>    been created as a place for DFDL schemas to be published. The
> Daffodil
> >>>    project also makes use of mailing lists, !HipChat, and Confluence
> >>>    Questions to build a community of users and system for support.
> >>>
> >>>    === Core Developers ===
> >>>
> >>>    The core developers of Daffodil are employed by Tresys Technology.
> We
> >>>    will work to grow the community among a more diverse set of
> developers
> >>>    and industries.
> >>>
> >>>    === Alignment ===
> >>>
> >>>    Daffodil was created as an open source project with a philosophy
> >>>    consistent with The Apache Way. A strong belief in meritocracy,
> >>>    community involvement in decisions, openness, and ensuring a high
> level
> >>>    of quality in code, documentation, and testing are some of our
> shared
> >>>    core beliefs.
> >>>
> >>>    Further, as mentioned in the Rationale section, Daffodil fills a gap
> >>>    that exists in many ASF projects, including !NiFi, Spark, Storm,
> Hadoop,
> >>>    Tika, and others. In order for tools like these to consume new
> types of
> >>>    data, custom extensions are usually required. Rather than create
> such
> >>>    extensions, Daffodil provides an easy and standards-compliant way to
> >>>    transform data to XML or JSON, which many of these tools already
> >>>    natively support.
> >>>
> >>>    == Known Risks ==
> >>>
> >>>    === Orphaned Products ===
> >>>
> >>>    The current core developers are the leading contributors in the
> space of
> >>>    DFDL and wish to see it flourish. Though there is some risk that the
> >>>    initial committers all come from the same company, a goal of
> entering
> >>>    into incubation is to grow the development community to minimize the
> >>>    risk of reliance on a single company.
> >>>
> >>>    === Inexperience with Open Source ===
> >>>
> >>>    The Daffodil project began as an open source project and has
> continued
> >>>    that model throughout development. This includes public bug
> tracking,
> >>>    git revision control, automated builds and tests, and a public wiki
> for
> >>>    documentation.
> >>>
> >>>    Additionally, the current core developers and initial committers all
> >>>    work for a company that relies on, believes in, promotes, and has
> led or
> >>>    contributed to many open source software projects, including SELinux
> >>>    Userspace, OpenSCAP, CLIP, refpolicy, setools, RPM, and others. As
> such,
> >>>    there is low risk related to inexperience with open source software
> and
> >>>    processes.
> >>>
> >>>    === Homogeneous Developers ===
> >>>
> >>>    The proposed initial committers come from a single entity, though
> we are
> >>>    committed to growing the Daffodil development community to include a
> >>>    broad group of additional committers from a wide array of
> industries.
> >>>
> >>>    === Reliance on Salaried Developers ===
> >>>
> >>>    The proposed initial committers are paid by their employer to
> contribute
> >>>    to the Daffodil project. We expect that Daffodil development will
> >>>    continue with salaried developers, and are committed to growing the
> >>>    community to include non-salaried developers as well.
> >>>
> >>>    === Relationship with other Apache Projects ===
> >>>
> >>>    As mentioned in the Alignment section, Daffodil fills a clear gap in
> >>>    numerous other ASF projects that consume and manage large amounts
> of data.
> >>>
> >>>    As a specific example, Daffodil developers have created a Daffodil
> >>>    Apache !NiFi Processor, currently in use in data transfer solutions,
> >>>    which allows one to ingest non-native data into an Apache !NiFi
> pipeline
> >>>    as XML or JSON. This processor was well received by the Apache !NiFi
> >>>    developers, with positive comments about the concise API and how it
> >>>    could handle non-native data. Daffodil developers have also
> successfully
> >>>    prototyped integration with Apache Spark. We believe Daffodil could
> >>>    provide a strong benefit to many other ASF projects that handle
> fixed
> >>>    format data. We anticipate working closely with such ASF projects to
> >>>    include Daffodil where applicable to increase their ability to
> support
> >>>    new data formats with minimal effort.
> >>>
> >>>    Daffodil also depends on existing ASF projects, including Apache
> Commons
> >>>    and Apache Xerces.
> >>>
> >>>    === An Excessive Fascination with the Apache Brand ===
> >>>
> >>>    Although the Apache brand may certainly help to attract more
> >>>    contributors, publicity is not the reason for this proposal. We
> believe
> >>>    Daffodil could provide a great benefit to the ASF and the numerous
> data
> >>>    focused projects that comprise it, as described in the Rationale and
> >>>    Alignment sections. We hope to build a strong and vibrant community
> >>>    built around The Apache Way, and not dependent on a single company.
> >>>
> >>>    === Documentation ===
> >>>
> >>>    Daffodil documentation can be found at:
> >>>
> >>>    *
> >>>    https://opensource.ncsa.illinois.edu/confluence/
> >>>    display/DFDL/Daffodil%3A+Open+Source+DFDL
> >>>
> >>>    Information about DFDL can be found at:
> >>>
> >>>    * https://www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
> >>>    *
> >>>    https://www.ibm.com/support/knowledgecenter/en/SSMKHH_9.0.
> >>>    0/com.ibm.etools.mft.doc/df20060_.htm
> >>>
> >>>    Public examples of DFDL Schemas can be found at:
> >>>
> >>>    * https://github.com/DFDLSchemas
> >>>
> >>>    == Initial Source ==
> >>>
> >>>    The Daffodil git repo goes back to mid-2011 with approximately 20
> >>>    different contributors and feedback from many users and developers.
> The
> >>>    core codebase is written in Scala and includes both a Scala and Java
> >>>    API, along with Javadocs and Scaladocs for API usage. The initial
> code
> >>>    will come from the git repository currently hosted by NCSA at the
> >>>    University of Illinois :
> >>>
> >>>    https://opensource.ncsa.illinois.edu/bitbucket/
> >>>    projects/DFDL/repos/daffodil/
> >>>
> >>>    == Source and Intellectual Property Submission ==
> >>>
> >>>    The complete Daffodil code is licensed under the University of
> >>>    Illinois/NCSA Open Source License. Much of the current codebase has
> been
> >>>    developed by Tresys Technology, who is open to relicensing the code
> to
> >>>    the Apache License version 2.0 and donate the source to the ASF.
> >>>    Contacts at NCSA are also open to relicensing their contributions to
> >>>    Apache v2. We plan to contact the other contributors and ask for
> >>>    permission to relicense and donate their contributed code. For those
> >>>    that decline or we cannot contact, their code will be removed or
> >>>    replaced. We will work closely with Apache Legal to ensure all
> issues
> >>>    related to relicensing are acceptable.
> >>>
> >>>    == External Dependencies ==
> >>>
> >>>    We believe all current dependencies are compatible with the ASF
> >>>    guidelines. Our dependency licenses come from the following license
> >>>    styles: Apache v2, BSD, MIT, and ICU. The list of current Daffodil
> >>>    dependencies and their licenses are documented here:
> >>>
> >>>    https://opensource.ncsa.illinois.edu/confluence/
> >>>    display/DFDL/Dependencies+and+Licenses
> >>>
> >>>    == Cryptography ==
> >>>
> >>>    None
> >>>
> >>>    == Required Resources ==
> >>>
> >>>    === Mailing Lists ===
> >>>
> >>>    * commits@daffodil.incubator.apache.org
> >>>    * dev@daffodil.incubator.apache.org
> >>>    * private@daffodil.incubator.apache.org
> >>>    * user@daffodil.incubator.apache.org
> >>>
> >>>    === Source Control ===
> >>>
> >>>    git://git.apache.org/incubator-daffodil.git
> >>>
> >>>    === Issue Tracking ===
> >>>
> >>>    JIRA Daffodil (DFDL)
> >>>
> >>>    === Initial Committers ===
> >>>
> >>>    * Beth Finnegan <efinnegan at tresys dot com>
> >>>    * Dave Thompson <dthompson at tresys dot com>
> >>>    * Josh Adams <jadams at tresys dot com>
> >>>    * Mike Beckerle <mbeckerle at tresys dot com>
> >>>    * Steve Lawrence <slawrence at tresys dot com>
> >>>    * Taylor Wise <twise at tresys dot com>
> >>>
> >>>    === Affiliations ===
> >>>
> >>>    * Beth Finnegan (Tresys Technology)
> >>>    * Dave Thompson (Tresys Technology)
> >>>    * Josh Adams (Tresys Technology)
> >>>    * Mike Beckerle (Tresys Technology)
> >>>    * Steve Lawrence (Tresys Technology)
> >>>    * Taylor Wise (Tresys Technology)
> >>>
> >>>    == Sponsors ==
> >>>
> >>>    === Champion ===
> >>>
> >>>    * TBD
> >>>
> >>>    === Nominated Mentors ===
> >>>
> >>>    * TBD
> >>>
> >>>    === Sponsoring Entity ===
> >>>
> >>>    We request the Apache Incubator to sponsor this project.
> >>>
> >>>
> ---------------------------------------------------------------------
> >>>    To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >>>    For additional commands, e-mail: general-help@incubator.apache.org
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> ---------------------------------------------------------------------
> >>>    To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> <ma...@incubator.apache.org>
> >>>    For additional commands, e-mail: general-help@incubator.apache.org
> <ma...@incubator.apache.org>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >>> For additional commands, e-mail: general-help@incubator.apache.org
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >> For additional commands, e-mail: general-help@incubator.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>

Re: [DISCUSS] Daffodil Incubation Proposal

Posted by Steve Lawrence <st...@gmail.com>.
Understood. Thanks for the interest!

- Steve

On 08/02/2017 02:57 PM, Dave Fisher wrote:
> Hi Steve,
> 
> It was not so much the lack of committers as it was the current diversity. That is not a blocker for entry to Incubation.
> 
> I am willing to be one of the Mentors. Once there are at least two more we can push forward.
> 
> Regards,
> Dave
> 
>> On Aug 1, 2017, at 5:09 AM, Steve Lawrence <st...@gmail.com> wrote:
>>
>> Discussions have died down, and I think the consensus from the responses
>> is that the issues are 1) the lack of committers and 2) the lack of a
>> champion and mentors. We hope to address #1 and grow the community as
>> part of incubation. Is anyone interested in being a champion or mentor
>> and help us with #2?
>>
>> Thanks,
>> - Steve
>>
>> On 07/26/2017 04:06 PM, Chris Mattmann wrote:
>>> This sounds like a very interesting project.
>>>
>>> I don’t have the time to mentor at the moment but I will keep a close eye on it.
>>>
>>> Cheers,
>>> Chris Mattmann
>>>
>>>
>>>
>>>
>>> On 7/25/17, 11:53 AM, "McHenry, Kenton Guadron" <mc...@illinois.edu> wrote:
>>>
>>>    Hi Dave,
>>>
>>>    The developers that were at NCSA have moved on to other organizations.  While we still leverage Daffodil and are very much interested in seeing it move forward, development is currently done by the Tresys team.  Agreed on the synergy with Tika.
>>>
>>>    Kenton McHenry, Ph.D.
>>>    Principal Research Scientist, Adjunct Assistant Professor of Computer Science
>>>    Deputy Director of the Scientific Software & Applications Division
>>>    National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign
>>>
>>>    On Jul 24, 2017, at 1:55 PM, Dave Fisher <da...@comcast.net>> wrote:
>>>
>>>    Hi Kenton,
>>>
>>>    Is there any reason that you and others from the NCSA are not Initial Committers? That would make this proposal stronger.
>>>
>>>    Regarding Apache Tika - it relies on other projects including Apache POI and Apache PDFBox. They are pragmatic about what is used. If Daffodil works to expand then I think that there would be good synergy between the projects. I know as a POI PMC member that the POI community has significantly benefited from the Tika community some of whom are from Mitre.
>>>
>>>    To date Tika has not emphasized structured data, although they do extract content from Excel and OpenOffice.
>>>
>>>    I am intrigued.
>>>
>>>    Regards,
>>>    Dave
>>>
>>>    On Jul 24, 2017, at 10:55 AM, McHenry, Kenton Guadron <mc...@illinois.edu>> wrote:
>>>
>>>    Yes, DFDL and its open source implementation Daffodil are more about file formats and getting access to the entirety of a file's contents in a consistent way through machine readable specifications.  The work has implications in the area of digital preservation allowing one to preserve these machine readable specifications rather than all the tools needed to open/save a file in order to work with it.  Imagine someone developing graphics software to work with 3D models and not having to worry about the hundreds of formats out there for 3D meshes (whether there are tools for opening the files and whether they can get access to those tools, whether the spec is available and worrying about how complex that spec is to implement, etc.), and simply building their code around the contents (e.g. vertices, faces, etc.).  One could come up with similar scenarios for other data types (documents, images, videos, audio, depth data, numeric data).  Ideally tools built supporting DFDL, could someday, support any format for that type without the developer having to worry about the details of how that data is represented within a file.
>>>
>>>    Kenton McHenry, Ph.D.
>>>    Principal Research Scientist, Adjunct Assistant Professor of Computer Science
>>>    Deputy Director of the Scientific Software & Applications Division
>>>    National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign
>>>
>>>    On Jul 24, 2017, at 10:30 AM, Steve Lawrence <st...@gmail.com>> wrote:
>>>
>>>    I'll preface this saying that I don't have a ton of experience with
>>>    Apache Tika. But based on my understanding, Tika and Daffodil do have
>>>    somewhat similar goals, but reach them in different ways. For example,
>>>    Tika requires that one writes /code/ to perform data extraction, usually
>>>    relying on existing Java libraries to extract the desired metadata. The
>>>    downside to this is that code can be buggy, and libraries might not even
>>>    exist for formats of interest (especially common with legacy and
>>>    military data).
>>>
>>>    Daffodil, on the other hand, does not require one to write any code.
>>>    Instead, one writes a DFDL Schema (similar to XML Schema, with DFDL
>>>    annotations) that fully describes the data, which Daffodil then uses to
>>>    convert the data to XML/JSON for extraction. So adding support for a new
>>>    format means writing a new schema rather than new code. And less code
>>>    generally means less bugs. Also, for secure systems that require
>>>    certification, generally speaking, it is easier to certify a schema as
>>>    compared to code.
>>>
>>>    We certainly don't believe that Daffodil could replace Tika, but it does
>>>    have the potential to add new functionality to Tika for formats that do
>>>    not have existing libraries. One of our goals is to look into
>>>    integrating Daffodil support into tools like Tika. We'd love to hear
>>>    from Tika devs if this is something they'd be interested in.
>>>
>>>    I'll also add that whereas Tika tends to focus primarily on metadata,
>>>    DFDL schemas usually describe an entire file format down to the byte, so
>>>    one can extract more than just meta data, including text and binary
>>>    data. Further differentiating, Daffodil has support for serializing data
>>>    (called unparse) from the XML/JSON representation, allowing one to
>>>    transform or filter data as well. We don't believe this feature is all
>>>    that applicable to Tika, but may be useful to other technologies such as
>>>    filtering or data fuzzing technologies.
>>>
>>>    - Steve
>>>
>>>
>>>    On 07/24/2017 10:59 AM, Mike Drob wrote:
>>>    What is the relationship between Daffodil and something like Apache Tika's
>>>    extraction engine?
>>>
>>>    On Mon, Jul 24, 2017 at 9:53 AM, Steve Lawrence <
>>>    stephen.d.lawrence@gmail.com<ma...@gmail.com>> wrote:
>>>
>>>    Dear Apache Incubator Community,
>>>
>>>    We would like to start a discussion around a proposal to bring Daffodil
>>>    into the Apache Incubator. Daffodil is a implementation of the DFDL
>>>    specification used to convert between fixed format data and XML/JSON.
>>>
>>>    The draft proposal can be found in the wiki at the following URL:
>>>
>>>    https://wiki.apache.org/incubator/DaffodilProposal
>>>
>>>    We do not yet have a champion or mentors, but it was recommended that we
>>>    create a proposal and send it to this list to potentially find those
>>>    that might be interested. The text for the draft proposal is found
>>>    below. We look forward to your input.
>>>
>>>    Thanks,
>>>    -Steve
>>>
>>>
>>>    = Daffodil Proposal =
>>>
>>>    == Abstract ==
>>>
>>>    Daffodil is an implementation of the Data Format Description Language
>>>    (DFDL) used to convert between fixed format data and XML/JSON.
>>>
>>>    == Proposal ==
>>>
>>>    The Data Format Description Language (DFDL) is a specification,
>>>    developed by the Open Grid Forum, capable of describing many data
>>>    formats, including both textual and binary, scientific and numeric,
>>>    legacy and modern, commercial record-oriented, and many industry and
>>>    military standards. It defines a language that is a subset of W3C XML
>>>    schema to describe the logical format of the data, and annotations
>>>    within the schema to describe the physical representation.
>>>
>>>    Daffodil is an open source implementation of the DFDL specification that
>>>    uses these DFDL schemas to parse fixed format data into an infoset,
>>>    which is most commonly represented as either XML or JSON. This allows
>>>    the use of well-established XML or JSON technologies and libraries to
>>>    consume, inspect, and manipulate fixed format data in existing
>>>    solutions. Daffodil is also capable of the reverse by serializing or
>>>    "unparsing" an XML or JSON infoset back to the original data format.
>>>
>>>    == Background ==
>>>
>>>    Many different software solutions need to consume and manage data,
>>>    including data directed routing, databases, data analysis, data
>>>    cleansing, data visualizing, and more. A key aspect of such solutions is
>>>    the need to transform the data into an easily consumable format.
>>>    Usually, this means that for each unique data format, one develops a
>>>    tool that can read and extract the necessary information, often leading
>>>    to ad-hoc and data-format-specific description systems. Such systems are
>>>    often proprietary, not well tested, and incompatible, leading to vendor
>>>    lock-in, flawed software, and increased training costs. DFDL is a new
>>>    standard, with version 1.0 completed in October of 2016, that solves
>>>    these problems by defining an open standard to describe many different
>>>    data formats and how to parse and unparse between the data and XML/JSON.
>>>
>>>    Two closed source implementations of DFDL currently exist. The first was
>>>    created by IBM and is now part of their IBM® Integration Bus product.
>>>    The second was created by the European Space Agency, called DFDL4S or
>>>    "DFDL for Space" targeted at the challenges of their satellite data
>>>    processing.
>>>
>>>    Around 2005, Pacific Northwest National Lab created Defuddle, built as
>>>    an open source implementation and proof of concept of the draft DFDL
>>>    specification and a test bed to feed new concepts into specification
>>>    development. Primary development of Defuddle was eventually taken over
>>>    by the National Center for Supercomputing Applications (NCSA). However,
>>>    due to evolution of the DFDL specification and architectural and
>>>    performance issues with Defuddle, around 2009, NCSA restarted the
>>>    project with the new name of Daffodil, with a goal of implementing the
>>>    complete DFDL specification. Daffodil development continued at NCSA
>>>    until around 2012, at which point development slowed due to budget
>>>    limitations. Shortly thereafter, primary development was picked up by
>>>    Tresys Technology where it continues today, with contributions from
>>>    other entities such as the Navy Research Lab, the Air Force Research
>>>    Lab, MITRE, and Booz Allen Hamilton. In February of 2015, Daffodil
>>>    version 1.0.0 was released, including support for the DFDL features
>>>    needed to parse many common file formats. Daffodil version 2.0.0 is
>>>    expected to be released in August of 2017, which will include unparse
>>>    support with one-to-one parsing feature parity.
>>>
>>>    Entities including IBM, MITRE, NATO NCI Agency, Northrop-Grumman, Quark
>>>    Security, Raytheon, and Tresys Technology have developed DFDL schemas
>>>    for many data formats from varying technology domains, including PNG,
>>>    GIF, BMP, PCAP, HL7, EDIFACT, NACHA, vCard, iCalendar, and MIL-STD-2045,
>>>    many of which are publicly available on the DFDL Schemas github. There
>>>    are also a number of military-application data formats, the
>>>    specifications of which are not public, which have historically been
>>>    very difficult and expensive to process, and for which DFDL schemas have
>>>    been created or are actively in development; these include
>>>    MIL-STD-6040/USMTF ATO, MIL-STD-6017/VMF, MIL-STD-6016/NATO STANAG 5516
>>>    (aka "Link16").
>>>
>>>    == Rationale ==
>>>
>>>    Numerous software solutions exist that consume, inspect, analyze, and
>>>    transform data, many of which can be found in the Apache Software
>>>    Foundation (ASF). In order for tools like these to consume new types of
>>>    data, custom extensions are usually required, often with high
>>>    development and testing costs. Daffodil fills a clear gap in many of
>>>    these solutions, providing a simple and low cost way to transform data
>>>    to XML or JSON, which many of these tools natively support already. With
>>>    the upcoming 2.0.0 release, the Daffodil project will have achieved a
>>>    level of functionality in both parse and unparse that, when integrated
>>>    into existing solutions, could provide for a new method to quickly
>>>    enable support for new data formats.
>>>
>>>    == Initial Goals ==
>>>
>>>    * Relicense the existing code from the University of Illinois/NCSA Open
>>>    Source License to the Apache License version 2.0, working with Apache
>>>    Legal to ensure correctness, and with Daffodil contributors to get
>>>    their permission.
>>>    * Move the existing codebase, documentation, bugs, and mailing lists to
>>>    the Apache hosted infrastructure
>>>    * Establish a formal release process and schedule, allowing for
>>>    dependable release cycles in a manner consistent with the Apache
>>>    development process.
>>>    * Build relationships with ASF projects to add Daffodil support where
>>>    appropriate
>>>    * Grow the community to establish a diversity of background and expertise.
>>>
>>>    == Current Status ==
>>>
>>>    === Meritocracy ===
>>>
>>>    All initial committers are familiar with the principles of meritocracy.
>>>    The Daffodil project has followed the model of meritocracy in the past,
>>>    providing multiple outside entities commit access based on the quality
>>>    of their contributions. In order to grow the Daffodil user base and
>>>    development community, we are dedicated to continuing to operate
>>>    Daffodil as a meritocracy.
>>>
>>>    A key ingredient in a meritocracy of developers is open group code
>>>    review. The Daffodil project has operated in this mode throughout its
>>>    existence and this provides a forum to improve the code, verify code
>>>    quality, and educate new developers on the code base.
>>>
>>>    === Community ===
>>>
>>>    Daffodil has a small community of users and developers. Although primary
>>>    Daffodil development is done by Tresys Technology, a handful of other
>>>    contributions have come from other entities including the Navy Research
>>>    Lab, the Air Force Research Lab, MITRE, and Booz Allen Hamilton. In
>>>    addition to developers, multiple users of Daffodil have created DFDL
>>>    schemas, including entities such as MITRE, IBM, Raytheon, Quark
>>>    Security, and Tresys Technology. The DFDL Schemas github community has
>>>    been created as a place for DFDL schemas to be published. The Daffodil
>>>    project also makes use of mailing lists, !HipChat, and Confluence
>>>    Questions to build a community of users and system for support.
>>>
>>>    === Core Developers ===
>>>
>>>    The core developers of Daffodil are employed by Tresys Technology. We
>>>    will work to grow the community among a more diverse set of developers
>>>    and industries.
>>>
>>>    === Alignment ===
>>>
>>>    Daffodil was created as an open source project with a philosophy
>>>    consistent with The Apache Way. A strong belief in meritocracy,
>>>    community involvement in decisions, openness, and ensuring a high level
>>>    of quality in code, documentation, and testing are some of our shared
>>>    core beliefs.
>>>
>>>    Further, as mentioned in the Rationale section, Daffodil fills a gap
>>>    that exists in many ASF projects, including !NiFi, Spark, Storm, Hadoop,
>>>    Tika, and others. In order for tools like these to consume new types of
>>>    data, custom extensions are usually required. Rather than create such
>>>    extensions, Daffodil provides an easy and standards-compliant way to
>>>    transform data to XML or JSON, which many of these tools already
>>>    natively support.
>>>
>>>    == Known Risks ==
>>>
>>>    === Orphaned Products ===
>>>
>>>    The current core developers are the leading contributors in the space of
>>>    DFDL and wish to see it flourish. Though there is some risk that the
>>>    initial committers all come from the same company, a goal of entering
>>>    into incubation is to grow the development community to minimize the
>>>    risk of reliance on a single company.
>>>
>>>    === Inexperience with Open Source ===
>>>
>>>    The Daffodil project began as an open source project and has continued
>>>    that model throughout development. This includes public bug tracking,
>>>    git revision control, automated builds and tests, and a public wiki for
>>>    documentation.
>>>
>>>    Additionally, the current core developers and initial committers all
>>>    work for a company that relies on, believes in, promotes, and has led or
>>>    contributed to many open source software projects, including SELinux
>>>    Userspace, OpenSCAP, CLIP, refpolicy, setools, RPM, and others. As such,
>>>    there is low risk related to inexperience with open source software and
>>>    processes.
>>>
>>>    === Homogeneous Developers ===
>>>
>>>    The proposed initial committers come from a single entity, though we are
>>>    committed to growing the Daffodil development community to include a
>>>    broad group of additional committers from a wide array of industries.
>>>
>>>    === Reliance on Salaried Developers ===
>>>
>>>    The proposed initial committers are paid by their employer to contribute
>>>    to the Daffodil project. We expect that Daffodil development will
>>>    continue with salaried developers, and are committed to growing the
>>>    community to include non-salaried developers as well.
>>>
>>>    === Relationship with other Apache Projects ===
>>>
>>>    As mentioned in the Alignment section, Daffodil fills a clear gap in
>>>    numerous other ASF projects that consume and manage large amounts of data.
>>>
>>>    As a specific example, Daffodil developers have created a Daffodil
>>>    Apache !NiFi Processor, currently in use in data transfer solutions,
>>>    which allows one to ingest non-native data into an Apache !NiFi pipeline
>>>    as XML or JSON. This processor was well received by the Apache !NiFi
>>>    developers, with positive comments about the concise API and how it
>>>    could handle non-native data. Daffodil developers have also successfully
>>>    prototyped integration with Apache Spark. We believe Daffodil could
>>>    provide a strong benefit to many other ASF projects that handle fixed
>>>    format data. We anticipate working closely with such ASF projects to
>>>    include Daffodil where applicable to increase their ability to support
>>>    new data formats with minimal effort.
>>>
>>>    Daffodil also depends on existing ASF projects, including Apache Commons
>>>    and Apache Xerces.
>>>
>>>    === An Excessive Fascination with the Apache Brand ===
>>>
>>>    Although the Apache brand may certainly help to attract more
>>>    contributors, publicity is not the reason for this proposal. We believe
>>>    Daffodil could provide a great benefit to the ASF and the numerous data
>>>    focused projects that comprise it, as described in the Rationale and
>>>    Alignment sections. We hope to build a strong and vibrant community
>>>    built around The Apache Way, and not dependent on a single company.
>>>
>>>    === Documentation ===
>>>
>>>    Daffodil documentation can be found at:
>>>
>>>    *
>>>    https://opensource.ncsa.illinois.edu/confluence/
>>>    display/DFDL/Daffodil%3A+Open+Source+DFDL
>>>
>>>    Information about DFDL can be found at:
>>>
>>>    * https://www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
>>>    *
>>>    https://www.ibm.com/support/knowledgecenter/en/SSMKHH_9.0.
>>>    0/com.ibm.etools.mft.doc/df20060_.htm
>>>
>>>    Public examples of DFDL Schemas can be found at:
>>>
>>>    * https://github.com/DFDLSchemas
>>>
>>>    == Initial Source ==
>>>
>>>    The Daffodil git repo goes back to mid-2011 with approximately 20
>>>    different contributors and feedback from many users and developers. The
>>>    core codebase is written in Scala and includes both a Scala and Java
>>>    API, along with Javadocs and Scaladocs for API usage. The initial code
>>>    will come from the git repository currently hosted by NCSA at the
>>>    University of Illinois :
>>>
>>>    https://opensource.ncsa.illinois.edu/bitbucket/
>>>    projects/DFDL/repos/daffodil/
>>>
>>>    == Source and Intellectual Property Submission ==
>>>
>>>    The complete Daffodil code is licensed under the University of
>>>    Illinois/NCSA Open Source License. Much of the current codebase has been
>>>    developed by Tresys Technology, who is open to relicensing the code to
>>>    the Apache License version 2.0 and donate the source to the ASF.
>>>    Contacts at NCSA are also open to relicensing their contributions to
>>>    Apache v2. We plan to contact the other contributors and ask for
>>>    permission to relicense and donate their contributed code. For those
>>>    that decline or we cannot contact, their code will be removed or
>>>    replaced. We will work closely with Apache Legal to ensure all issues
>>>    related to relicensing are acceptable.
>>>
>>>    == External Dependencies ==
>>>
>>>    We believe all current dependencies are compatible with the ASF
>>>    guidelines. Our dependency licenses come from the following license
>>>    styles: Apache v2, BSD, MIT, and ICU. The list of current Daffodil
>>>    dependencies and their licenses are documented here:
>>>
>>>    https://opensource.ncsa.illinois.edu/confluence/
>>>    display/DFDL/Dependencies+and+Licenses
>>>
>>>    == Cryptography ==
>>>
>>>    None
>>>
>>>    == Required Resources ==
>>>
>>>    === Mailing Lists ===
>>>
>>>    * commits@daffodil.incubator.apache.org
>>>    * dev@daffodil.incubator.apache.org
>>>    * private@daffodil.incubator.apache.org
>>>    * user@daffodil.incubator.apache.org
>>>
>>>    === Source Control ===
>>>
>>>    git://git.apache.org/incubator-daffodil.git
>>>
>>>    === Issue Tracking ===
>>>
>>>    JIRA Daffodil (DFDL)
>>>
>>>    === Initial Committers ===
>>>
>>>    * Beth Finnegan <efinnegan at tresys dot com>
>>>    * Dave Thompson <dthompson at tresys dot com>
>>>    * Josh Adams <jadams at tresys dot com>
>>>    * Mike Beckerle <mbeckerle at tresys dot com>
>>>    * Steve Lawrence <slawrence at tresys dot com>
>>>    * Taylor Wise <twise at tresys dot com>
>>>
>>>    === Affiliations ===
>>>
>>>    * Beth Finnegan (Tresys Technology)
>>>    * Dave Thompson (Tresys Technology)
>>>    * Josh Adams (Tresys Technology)
>>>    * Mike Beckerle (Tresys Technology)
>>>    * Steve Lawrence (Tresys Technology)
>>>    * Taylor Wise (Tresys Technology)
>>>
>>>    == Sponsors ==
>>>
>>>    === Champion ===
>>>
>>>    * TBD
>>>
>>>    === Nominated Mentors ===
>>>
>>>    * TBD
>>>
>>>    === Sponsoring Entity ===
>>>
>>>    We request the Apache Incubator to sponsor this project.
>>>
>>>    ---------------------------------------------------------------------
>>>    To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>>    For additional commands, e-mail: general-help@incubator.apache.org
>>>
>>>
>>>
>>>
>>>
>>>    ---------------------------------------------------------------------
>>>    To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org<ma...@incubator.apache.org>
>>>    For additional commands, e-mail: general-help@incubator.apache.org<ma...@incubator.apache.org>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [DISCUSS] Daffodil Incubation Proposal

Posted by Dave Fisher <da...@comcast.net>.
Hi Steve,

It was not so much the lack of committers as it was the current diversity. That is not a blocker for entry to Incubation.

I am willing to be one of the Mentors. Once there are at least two more we can push forward.

Regards,
Dave

> On Aug 1, 2017, at 5:09 AM, Steve Lawrence <st...@gmail.com> wrote:
> 
> Discussions have died down, and I think the consensus from the responses
> is that the issues are 1) the lack of committers and 2) the lack of a
> champion and mentors. We hope to address #1 and grow the community as
> part of incubation. Is anyone interested in being a champion or mentor
> and help us with #2?
> 
> Thanks,
> - Steve
> 
> On 07/26/2017 04:06 PM, Chris Mattmann wrote:
>> This sounds like a very interesting project.
>> 
>> I don’t have the time to mentor at the moment but I will keep a close eye on it.
>> 
>> Cheers,
>> Chris Mattmann
>> 
>> 
>> 
>> 
>> On 7/25/17, 11:53 AM, "McHenry, Kenton Guadron" <mc...@illinois.edu> wrote:
>> 
>>    Hi Dave,
>> 
>>    The developers that were at NCSA have moved on to other organizations.  While we still leverage Daffodil and are very much interested in seeing it move forward, development is currently done by the Tresys team.  Agreed on the synergy with Tika.
>> 
>>    Kenton McHenry, Ph.D.
>>    Principal Research Scientist, Adjunct Assistant Professor of Computer Science
>>    Deputy Director of the Scientific Software & Applications Division
>>    National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign
>> 
>>    On Jul 24, 2017, at 1:55 PM, Dave Fisher <da...@comcast.net>> wrote:
>> 
>>    Hi Kenton,
>> 
>>    Is there any reason that you and others from the NCSA are not Initial Committers? That would make this proposal stronger.
>> 
>>    Regarding Apache Tika - it relies on other projects including Apache POI and Apache PDFBox. They are pragmatic about what is used. If Daffodil works to expand then I think that there would be good synergy between the projects. I know as a POI PMC member that the POI community has significantly benefited from the Tika community some of whom are from Mitre.
>> 
>>    To date Tika has not emphasized structured data, although they do extract content from Excel and OpenOffice.
>> 
>>    I am intrigued.
>> 
>>    Regards,
>>    Dave
>> 
>>    On Jul 24, 2017, at 10:55 AM, McHenry, Kenton Guadron <mc...@illinois.edu>> wrote:
>> 
>>    Yes, DFDL and its open source implementation Daffodil are more about file formats and getting access to the entirety of a file's contents in a consistent way through machine readable specifications.  The work has implications in the area of digital preservation allowing one to preserve these machine readable specifications rather than all the tools needed to open/save a file in order to work with it.  Imagine someone developing graphics software to work with 3D models and not having to worry about the hundreds of formats out there for 3D meshes (whether there are tools for opening the files and whether they can get access to those tools, whether the spec is available and worrying about how complex that spec is to implement, etc.), and simply building their code around the contents (e.g. vertices, faces, etc.).  One could come up with similar scenarios for other data types (documents, images, videos, audio, depth data, numeric data).  Ideally tools built supporting DFDL, could someday, support any format for that type without the developer having to worry about the details of how that data is represented within a file.
>> 
>>    Kenton McHenry, Ph.D.
>>    Principal Research Scientist, Adjunct Assistant Professor of Computer Science
>>    Deputy Director of the Scientific Software & Applications Division
>>    National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign
>> 
>>    On Jul 24, 2017, at 10:30 AM, Steve Lawrence <st...@gmail.com>> wrote:
>> 
>>    I'll preface this saying that I don't have a ton of experience with
>>    Apache Tika. But based on my understanding, Tika and Daffodil do have
>>    somewhat similar goals, but reach them in different ways. For example,
>>    Tika requires that one writes /code/ to perform data extraction, usually
>>    relying on existing Java libraries to extract the desired metadata. The
>>    downside to this is that code can be buggy, and libraries might not even
>>    exist for formats of interest (especially common with legacy and
>>    military data).
>> 
>>    Daffodil, on the other hand, does not require one to write any code.
>>    Instead, one writes a DFDL Schema (similar to XML Schema, with DFDL
>>    annotations) that fully describes the data, which Daffodil then uses to
>>    convert the data to XML/JSON for extraction. So adding support for a new
>>    format means writing a new schema rather than new code. And less code
>>    generally means less bugs. Also, for secure systems that require
>>    certification, generally speaking, it is easier to certify a schema as
>>    compared to code.
>> 
>>    We certainly don't believe that Daffodil could replace Tika, but it does
>>    have the potential to add new functionality to Tika for formats that do
>>    not have existing libraries. One of our goals is to look into
>>    integrating Daffodil support into tools like Tika. We'd love to hear
>>    from Tika devs if this is something they'd be interested in.
>> 
>>    I'll also add that whereas Tika tends to focus primarily on metadata,
>>    DFDL schemas usually describe an entire file format down to the byte, so
>>    one can extract more than just meta data, including text and binary
>>    data. Further differentiating, Daffodil has support for serializing data
>>    (called unparse) from the XML/JSON representation, allowing one to
>>    transform or filter data as well. We don't believe this feature is all
>>    that applicable to Tika, but may be useful to other technologies such as
>>    filtering or data fuzzing technologies.
>> 
>>    - Steve
>> 
>> 
>>    On 07/24/2017 10:59 AM, Mike Drob wrote:
>>    What is the relationship between Daffodil and something like Apache Tika's
>>    extraction engine?
>> 
>>    On Mon, Jul 24, 2017 at 9:53 AM, Steve Lawrence <
>>    stephen.d.lawrence@gmail.com<ma...@gmail.com>> wrote:
>> 
>>    Dear Apache Incubator Community,
>> 
>>    We would like to start a discussion around a proposal to bring Daffodil
>>    into the Apache Incubator. Daffodil is a implementation of the DFDL
>>    specification used to convert between fixed format data and XML/JSON.
>> 
>>    The draft proposal can be found in the wiki at the following URL:
>> 
>>    https://wiki.apache.org/incubator/DaffodilProposal
>> 
>>    We do not yet have a champion or mentors, but it was recommended that we
>>    create a proposal and send it to this list to potentially find those
>>    that might be interested. The text for the draft proposal is found
>>    below. We look forward to your input.
>> 
>>    Thanks,
>>    -Steve
>> 
>> 
>>    = Daffodil Proposal =
>> 
>>    == Abstract ==
>> 
>>    Daffodil is an implementation of the Data Format Description Language
>>    (DFDL) used to convert between fixed format data and XML/JSON.
>> 
>>    == Proposal ==
>> 
>>    The Data Format Description Language (DFDL) is a specification,
>>    developed by the Open Grid Forum, capable of describing many data
>>    formats, including both textual and binary, scientific and numeric,
>>    legacy and modern, commercial record-oriented, and many industry and
>>    military standards. It defines a language that is a subset of W3C XML
>>    schema to describe the logical format of the data, and annotations
>>    within the schema to describe the physical representation.
>> 
>>    Daffodil is an open source implementation of the DFDL specification that
>>    uses these DFDL schemas to parse fixed format data into an infoset,
>>    which is most commonly represented as either XML or JSON. This allows
>>    the use of well-established XML or JSON technologies and libraries to
>>    consume, inspect, and manipulate fixed format data in existing
>>    solutions. Daffodil is also capable of the reverse by serializing or
>>    "unparsing" an XML or JSON infoset back to the original data format.
>> 
>>    == Background ==
>> 
>>    Many different software solutions need to consume and manage data,
>>    including data directed routing, databases, data analysis, data
>>    cleansing, data visualizing, and more. A key aspect of such solutions is
>>    the need to transform the data into an easily consumable format.
>>    Usually, this means that for each unique data format, one develops a
>>    tool that can read and extract the necessary information, often leading
>>    to ad-hoc and data-format-specific description systems. Such systems are
>>    often proprietary, not well tested, and incompatible, leading to vendor
>>    lock-in, flawed software, and increased training costs. DFDL is a new
>>    standard, with version 1.0 completed in October of 2016, that solves
>>    these problems by defining an open standard to describe many different
>>    data formats and how to parse and unparse between the data and XML/JSON.
>> 
>>    Two closed source implementations of DFDL currently exist. The first was
>>    created by IBM and is now part of their IBM® Integration Bus product.
>>    The second was created by the European Space Agency, called DFDL4S or
>>    "DFDL for Space" targeted at the challenges of their satellite data
>>    processing.
>> 
>>    Around 2005, Pacific Northwest National Lab created Defuddle, built as
>>    an open source implementation and proof of concept of the draft DFDL
>>    specification and a test bed to feed new concepts into specification
>>    development. Primary development of Defuddle was eventually taken over
>>    by the National Center for Supercomputing Applications (NCSA). However,
>>    due to evolution of the DFDL specification and architectural and
>>    performance issues with Defuddle, around 2009, NCSA restarted the
>>    project with the new name of Daffodil, with a goal of implementing the
>>    complete DFDL specification. Daffodil development continued at NCSA
>>    until around 2012, at which point development slowed due to budget
>>    limitations. Shortly thereafter, primary development was picked up by
>>    Tresys Technology where it continues today, with contributions from
>>    other entities such as the Navy Research Lab, the Air Force Research
>>    Lab, MITRE, and Booz Allen Hamilton. In February of 2015, Daffodil
>>    version 1.0.0 was released, including support for the DFDL features
>>    needed to parse many common file formats. Daffodil version 2.0.0 is
>>    expected to be released in August of 2017, which will include unparse
>>    support with one-to-one parsing feature parity.
>> 
>>    Entities including IBM, MITRE, NATO NCI Agency, Northrop-Grumman, Quark
>>    Security, Raytheon, and Tresys Technology have developed DFDL schemas
>>    for many data formats from varying technology domains, including PNG,
>>    GIF, BMP, PCAP, HL7, EDIFACT, NACHA, vCard, iCalendar, and MIL-STD-2045,
>>    many of which are publicly available on the DFDL Schemas github. There
>>    are also a number of military-application data formats, the
>>    specifications of which are not public, which have historically been
>>    very difficult and expensive to process, and for which DFDL schemas have
>>    been created or are actively in development; these include
>>    MIL-STD-6040/USMTF ATO, MIL-STD-6017/VMF, MIL-STD-6016/NATO STANAG 5516
>>    (aka "Link16").
>> 
>>    == Rationale ==
>> 
>>    Numerous software solutions exist that consume, inspect, analyze, and
>>    transform data, many of which can be found in the Apache Software
>>    Foundation (ASF). In order for tools like these to consume new types of
>>    data, custom extensions are usually required, often with high
>>    development and testing costs. Daffodil fills a clear gap in many of
>>    these solutions, providing a simple and low cost way to transform data
>>    to XML or JSON, which many of these tools natively support already. With
>>    the upcoming 2.0.0 release, the Daffodil project will have achieved a
>>    level of functionality in both parse and unparse that, when integrated
>>    into existing solutions, could provide for a new method to quickly
>>    enable support for new data formats.
>> 
>>    == Initial Goals ==
>> 
>>    * Relicense the existing code from the University of Illinois/NCSA Open
>>    Source License to the Apache License version 2.0, working with Apache
>>    Legal to ensure correctness, and with Daffodil contributors to get
>>    their permission.
>>    * Move the existing codebase, documentation, bugs, and mailing lists to
>>    the Apache hosted infrastructure
>>    * Establish a formal release process and schedule, allowing for
>>    dependable release cycles in a manner consistent with the Apache
>>    development process.
>>    * Build relationships with ASF projects to add Daffodil support where
>>    appropriate
>>    * Grow the community to establish a diversity of background and expertise.
>> 
>>    == Current Status ==
>> 
>>    === Meritocracy ===
>> 
>>    All initial committers are familiar with the principles of meritocracy.
>>    The Daffodil project has followed the model of meritocracy in the past,
>>    providing multiple outside entities commit access based on the quality
>>    of their contributions. In order to grow the Daffodil user base and
>>    development community, we are dedicated to continuing to operate
>>    Daffodil as a meritocracy.
>> 
>>    A key ingredient in a meritocracy of developers is open group code
>>    review. The Daffodil project has operated in this mode throughout its
>>    existence and this provides a forum to improve the code, verify code
>>    quality, and educate new developers on the code base.
>> 
>>    === Community ===
>> 
>>    Daffodil has a small community of users and developers. Although primary
>>    Daffodil development is done by Tresys Technology, a handful of other
>>    contributions have come from other entities including the Navy Research
>>    Lab, the Air Force Research Lab, MITRE, and Booz Allen Hamilton. In
>>    addition to developers, multiple users of Daffodil have created DFDL
>>    schemas, including entities such as MITRE, IBM, Raytheon, Quark
>>    Security, and Tresys Technology. The DFDL Schemas github community has
>>    been created as a place for DFDL schemas to be published. The Daffodil
>>    project also makes use of mailing lists, !HipChat, and Confluence
>>    Questions to build a community of users and system for support.
>> 
>>    === Core Developers ===
>> 
>>    The core developers of Daffodil are employed by Tresys Technology. We
>>    will work to grow the community among a more diverse set of developers
>>    and industries.
>> 
>>    === Alignment ===
>> 
>>    Daffodil was created as an open source project with a philosophy
>>    consistent with The Apache Way. A strong belief in meritocracy,
>>    community involvement in decisions, openness, and ensuring a high level
>>    of quality in code, documentation, and testing are some of our shared
>>    core beliefs.
>> 
>>    Further, as mentioned in the Rationale section, Daffodil fills a gap
>>    that exists in many ASF projects, including !NiFi, Spark, Storm, Hadoop,
>>    Tika, and others. In order for tools like these to consume new types of
>>    data, custom extensions are usually required. Rather than create such
>>    extensions, Daffodil provides an easy and standards-compliant way to
>>    transform data to XML or JSON, which many of these tools already
>>    natively support.
>> 
>>    == Known Risks ==
>> 
>>    === Orphaned Products ===
>> 
>>    The current core developers are the leading contributors in the space of
>>    DFDL and wish to see it flourish. Though there is some risk that the
>>    initial committers all come from the same company, a goal of entering
>>    into incubation is to grow the development community to minimize the
>>    risk of reliance on a single company.
>> 
>>    === Inexperience with Open Source ===
>> 
>>    The Daffodil project began as an open source project and has continued
>>    that model throughout development. This includes public bug tracking,
>>    git revision control, automated builds and tests, and a public wiki for
>>    documentation.
>> 
>>    Additionally, the current core developers and initial committers all
>>    work for a company that relies on, believes in, promotes, and has led or
>>    contributed to many open source software projects, including SELinux
>>    Userspace, OpenSCAP, CLIP, refpolicy, setools, RPM, and others. As such,
>>    there is low risk related to inexperience with open source software and
>>    processes.
>> 
>>    === Homogeneous Developers ===
>> 
>>    The proposed initial committers come from a single entity, though we are
>>    committed to growing the Daffodil development community to include a
>>    broad group of additional committers from a wide array of industries.
>> 
>>    === Reliance on Salaried Developers ===
>> 
>>    The proposed initial committers are paid by their employer to contribute
>>    to the Daffodil project. We expect that Daffodil development will
>>    continue with salaried developers, and are committed to growing the
>>    community to include non-salaried developers as well.
>> 
>>    === Relationship with other Apache Projects ===
>> 
>>    As mentioned in the Alignment section, Daffodil fills a clear gap in
>>    numerous other ASF projects that consume and manage large amounts of data.
>> 
>>    As a specific example, Daffodil developers have created a Daffodil
>>    Apache !NiFi Processor, currently in use in data transfer solutions,
>>    which allows one to ingest non-native data into an Apache !NiFi pipeline
>>    as XML or JSON. This processor was well received by the Apache !NiFi
>>    developers, with positive comments about the concise API and how it
>>    could handle non-native data. Daffodil developers have also successfully
>>    prototyped integration with Apache Spark. We believe Daffodil could
>>    provide a strong benefit to many other ASF projects that handle fixed
>>    format data. We anticipate working closely with such ASF projects to
>>    include Daffodil where applicable to increase their ability to support
>>    new data formats with minimal effort.
>> 
>>    Daffodil also depends on existing ASF projects, including Apache Commons
>>    and Apache Xerces.
>> 
>>    === An Excessive Fascination with the Apache Brand ===
>> 
>>    Although the Apache brand may certainly help to attract more
>>    contributors, publicity is not the reason for this proposal. We believe
>>    Daffodil could provide a great benefit to the ASF and the numerous data
>>    focused projects that comprise it, as described in the Rationale and
>>    Alignment sections. We hope to build a strong and vibrant community
>>    built around The Apache Way, and not dependent on a single company.
>> 
>>    === Documentation ===
>> 
>>    Daffodil documentation can be found at:
>> 
>>    *
>>    https://opensource.ncsa.illinois.edu/confluence/
>>    display/DFDL/Daffodil%3A+Open+Source+DFDL
>> 
>>    Information about DFDL can be found at:
>> 
>>    * https://www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
>>    *
>>    https://www.ibm.com/support/knowledgecenter/en/SSMKHH_9.0.
>>    0/com.ibm.etools.mft.doc/df20060_.htm
>> 
>>    Public examples of DFDL Schemas can be found at:
>> 
>>    * https://github.com/DFDLSchemas
>> 
>>    == Initial Source ==
>> 
>>    The Daffodil git repo goes back to mid-2011 with approximately 20
>>    different contributors and feedback from many users and developers. The
>>    core codebase is written in Scala and includes both a Scala and Java
>>    API, along with Javadocs and Scaladocs for API usage. The initial code
>>    will come from the git repository currently hosted by NCSA at the
>>    University of Illinois :
>> 
>>    https://opensource.ncsa.illinois.edu/bitbucket/
>>    projects/DFDL/repos/daffodil/
>> 
>>    == Source and Intellectual Property Submission ==
>> 
>>    The complete Daffodil code is licensed under the University of
>>    Illinois/NCSA Open Source License. Much of the current codebase has been
>>    developed by Tresys Technology, who is open to relicensing the code to
>>    the Apache License version 2.0 and donate the source to the ASF.
>>    Contacts at NCSA are also open to relicensing their contributions to
>>    Apache v2. We plan to contact the other contributors and ask for
>>    permission to relicense and donate their contributed code. For those
>>    that decline or we cannot contact, their code will be removed or
>>    replaced. We will work closely with Apache Legal to ensure all issues
>>    related to relicensing are acceptable.
>> 
>>    == External Dependencies ==
>> 
>>    We believe all current dependencies are compatible with the ASF
>>    guidelines. Our dependency licenses come from the following license
>>    styles: Apache v2, BSD, MIT, and ICU. The list of current Daffodil
>>    dependencies and their licenses are documented here:
>> 
>>    https://opensource.ncsa.illinois.edu/confluence/
>>    display/DFDL/Dependencies+and+Licenses
>> 
>>    == Cryptography ==
>> 
>>    None
>> 
>>    == Required Resources ==
>> 
>>    === Mailing Lists ===
>> 
>>    * commits@daffodil.incubator.apache.org
>>    * dev@daffodil.incubator.apache.org
>>    * private@daffodil.incubator.apache.org
>>    * user@daffodil.incubator.apache.org
>> 
>>    === Source Control ===
>> 
>>    git://git.apache.org/incubator-daffodil.git
>> 
>>    === Issue Tracking ===
>> 
>>    JIRA Daffodil (DFDL)
>> 
>>    === Initial Committers ===
>> 
>>    * Beth Finnegan <efinnegan at tresys dot com>
>>    * Dave Thompson <dthompson at tresys dot com>
>>    * Josh Adams <jadams at tresys dot com>
>>    * Mike Beckerle <mbeckerle at tresys dot com>
>>    * Steve Lawrence <slawrence at tresys dot com>
>>    * Taylor Wise <twise at tresys dot com>
>> 
>>    === Affiliations ===
>> 
>>    * Beth Finnegan (Tresys Technology)
>>    * Dave Thompson (Tresys Technology)
>>    * Josh Adams (Tresys Technology)
>>    * Mike Beckerle (Tresys Technology)
>>    * Steve Lawrence (Tresys Technology)
>>    * Taylor Wise (Tresys Technology)
>> 
>>    == Sponsors ==
>> 
>>    === Champion ===
>> 
>>    * TBD
>> 
>>    === Nominated Mentors ===
>> 
>>    * TBD
>> 
>>    === Sponsoring Entity ===
>> 
>>    We request the Apache Incubator to sponsor this project.
>> 
>>    ---------------------------------------------------------------------
>>    To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>    For additional commands, e-mail: general-help@incubator.apache.org
>> 
>> 
>> 
>> 
>> 
>>    ---------------------------------------------------------------------
>>    To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org<ma...@incubator.apache.org>
>>    For additional commands, e-mail: general-help@incubator.apache.org<ma...@incubator.apache.org>
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org