You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@daffodil.apache.org by Mike Beckerle <mb...@apache.org> on 2021/12/03 22:25:14 UTC
simplified schema project layout
Experience in giving DFDL training via daffodil is that our standard schema
project layout <https://daffodil.apache.org/dfdl-layout/> is much too deep
(directory wise) for many users to conveniently navigate and use. It gets
in the way of learning.
Our layout was designed to follow sbt conventions that enable automated
dependency management, packaging, etc. It is easy to use if you are
accustomed to using an IDE like Eclipse or IntelliJ. It is also
extraordinarily valuable (and underappreciated) that 'sbt test' does a
built-in-self-test on a schema, and that 'sbt publishLocal' creates a Jar
of a DFDL schema for managed dependencies use between schemas.
But new users are mostly coming to DFDL/Daffodil from a command-line prompt
and a text editor (e.g., VIM).
I am wondering if we can have our cake and eat it too, without too much
added sbt complexity, and without losing 'sbt test' and 'sbt publishLocal'
working their magic for us.
E.g., what if a simplified layout was:
mySchema/schema - takes the place of src/main/*. Also no package-style
directory folder structure.
mySchema/test - takes the place of src/test/*. No package-style directory
folder structure.
It would be optional if users want to user mySchema/test/data and
mySchema/test/infosets to separate infosets and data, or just put all those
files in the same place and use file extensions (.dat vs. .dat.xml vs.
.tdml, etc.) to distinguish the kinds of content.
Such a flattened tree structure requires that the schema file names are
well chosen to be unlikely to conflict with other users chosen names, so a
name like common.dfdl.xsd or main.dfdl.xsd would be no good as there is no
package directory structure to make them unique.
But names like common-mySchema.dfdl.xsd and main-mySchema.dfdl.xsd would
still be quite convenient to use, particularly if the mySchema name is well
chosen. (Note how I've put the unique part of the name first, so that
name-completion will work most easily on command line.)
I think this would still work with sbt if we simply override the default
paths (and perhaps file patterns) used for specifying source and resources.
Thoughts?
Re: simplified schema project layout
Posted by Mike Beckerle <mb...@apache.org>.
I converted the DFDLSchemas CSV example to use the simplified layout.
I actually like this a lot better for simple examples than the
original "standard schema file system layout".
Take a look and see what you think:
https://github.com/DFDLSchemas/CSV/pull/7
On Wed, Dec 8, 2021 at 12:05 PM Mike Beckerle <mb...@apache.org> wrote:
>
> I will give this a try.
>
> On Wed, Dec 8, 2021 at 10:39 AM Steve Lawrence <sl...@apache.org> wrote:
> >
> > That's fair, I agree there definitely is some redundancy. In general I'm
> > not a huge fan of mixing sources and resources, but maybe it's not too
> > big of a deal since in this case since sources for UDF/Layers will be
> > rare, and when they do exist there's probably only a very small number
> > of them.
> >
> > I haven't tested this much, but based on some examples and playing
> > around a bit, I think this gets you what you're after:
> >
> > organization := "org.example"
> >
> > name := "dfdl-fmt"
> >
> > version := "0.1.0-SNAPSHOT"
> >
> > lazy val root = (project in file("."))
> > .settings(
> > Project.inConfig(Compile)(flattenSettings("src")),
> > Project.inConfig(Test)(flattenSettings("test")),
> > )
> >
> > def flattenSettings(name: String) = Seq(
> > unmanagedSourceDirectories := Seq(baseDirectory.value / name),
> > unmanagedResourceDirectories := unmanagedSourceDirectories.value,
> > unmanagedSources / includeFilter := "*.java" | "*.scala",
> > unmanagedResources / excludeFilter := (unmanagedSources /
> > includeFilter).value,
> > )
> >
> > (note that we probably also want many of the existing settings in our
> > current build.sbt files)
> >
> > All the non-test stuff goes in a "src" directory. Sources are anything
> > that ends with .java or .scala. Resources are anything that isn't a source.
> >
> > And the "test" directory has the exact same layout, but for tests.
> >
> > The .class files that end up in the jar are namespaced by the package line.
> >
> > The resources that end up in the jar are namespaced by the directory
> > structure and/or file naming convention as they are in the src/ or test/
> > directory. So schema authors can namespace schemas however they want,
> > whether it be directories or file names, or not at all.
> >
> >
> > On 12/8/21 9:56 AM, Mike Beckerle wrote:
> > > I guess my concern is that all the depth associated with the sbt-based
> > > standard layout feels completely redundant to me.
> > >
> > > I am suggesting of the src/main/scala, we need only main/. Of
> > > src/main/resources/kind we need only main/.
> > >
> > > E.g, Why are all the typed subdirs needed (xsd/, dfdl/, etc.) when
> > > file extensions can be used to distinguish resource types and
> > > programming language compilers to be used?
> > >
> > > To me the only "real" distinction in the standard project layout is
> > > main vs. test which is needed to exclude test stuff when packaging.
> > >
> > > The rest is
> > > (a) using directories as "package names" - which can be done with
> > > well-chosen longer file names
> > > (b) using directories as redundant file typing - which can be done
> > > with file name extensions.
> > >
> > > To me a UDF is a META-INF/services file and some scala/java code in
> > > the "main" area.
> > > Ditto for a layer definition.
> > >
> > > I guess concretely I am wondering if there is a way to override basic
> > > sbt settings like this:
> > >
> > > * Instead of src/main/scala, just look for main/*.scala
> > > * Instead of src/main/java, just look for main/*.java
> > > * Instead of src/main/resources/* just look for main/* where the file
> > > name does not end in ".scala" nor ".java"
> > >
> > > And similarly for test things, where src/test/whatever just becomes
> > > test/whatever and distinctions are made using file name extensions.
> > >
> > > On Wed, Dec 8, 2021 at 9:21 AM Steve Lawrence <sl...@apache.org> wrote:
> > >>
> > >> What about the scala/java/resources directories? Do those still exist or
> > >> are they simplified somehow?
> > >>
> > >> We currently have an xsd/ directory to allow schematron, xslt, etc to be
> > >> included in the same repo. Do we still have that directory?
> > >>
> > >> How do pluggable UDF's and Layers fit into this? Do we suggest those are
> > >> in separate repos, or can they fit into this?
> > >>
> > >> Note that I believe sbt supports organizations in a single directory
> > >> name, e.g.
> > >>
> > >> src/
> > >> └── main/
> > >> └── resources/
> > >> └── org.foo.myschema/
> > >> └── xsd/
> > >> └── common.xsd
> > >>
> > >> So that could be one approach to reduce the deep directory structures.
> > >>
> > >> Generally, I'm definitely in favor of simplifying the layout, but this
> > >> to me feels like it might just add more confusion since it's sort of
> > >> close to the existing layout, but not quite the same.
> > >>
> > >> If we are potentially going to go against the standards, and potentially
> > >> make IDE support more difficult, I almost wonder if we should be more
> > >> ambitious and come up with something that is completely different? I'm
> > >> not sure what that would be, but could be more flat. For example, maybe
> > >> something like this:
> > >>
> > >> dfdl-fmt/
> > >> ├── build.sbt
> > >> ├── dfdl/
> > >> │ ├── format.dfdl.xsd
> > >> │ └── main.dfdl.xsd
> > >> ├── layer/
> > >> │ └── MyLayer.scala
> > >> ├── sch/
> > >> ├── tdml/
> > >> │ └── main.tdml
> > >> ├── udf/
> > >> │ └── MyUDF.scala
> > >> └── xslt/
> > >>
> > >> A plugin could implicitly add organization structure so things are
> > >> namespace when building a jar. Or maybe we even do something like NiFi
> > >> has with .nar fles have have a custom package format, e.g. .dar
> > >>
> > >> It's probably a lot more work, and things to work out (e.g. how to
> > >> dependencies work for udf and layers), and almost certainly needs a
> > >> plugin to work instead of just tweaking sbt properties, but something
> > >> like that feels more ideal to me.
> > >>
> > >> Note that maybe we don't even use sbt for this. Maybe there's a better
> > >> tool for something like this.
> > >>
> > >> Another thing to consider that is related, with NiFi we found it
> > >> difficult to add jars to the NiFi classpath for a specific processor,
> > >> which means loading schemas from a jar on the classpath couldn't be
> > >> done. Having a custom package format could make this easier, since all
> > >> the .dar processing/lookup would be done by Daffodil rather than
> > >> standard classpath lookups.
> > >>
> > >>
> > >> On 12/3/21 5:25 PM, Mike Beckerle wrote:
> > >>> Experience in giving DFDL training via daffodil is that our standard schema
> > >>> project layout <https://daffodil.apache.org/dfdl-layout/> is much too deep
> > >>> (directory wise) for many users to conveniently navigate and use. It gets
> > >>> in the way of learning.
> > >>>
> > >>> Our layout was designed to follow sbt conventions that enable automated
> > >>> dependency management, packaging, etc. It is easy to use if you are
> > >>> accustomed to using an IDE like Eclipse or IntelliJ. It is also
> > >>> extraordinarily valuable (and underappreciated) that 'sbt test' does a
> > >>> built-in-self-test on a schema, and that 'sbt publishLocal' creates a Jar
> > >>> of a DFDL schema for managed dependencies use between schemas.
> > >>>
> > >>> But new users are mostly coming to DFDL/Daffodil from a command-line prompt
> > >>> and a text editor (e.g., VIM).
> > >>>
> > >>> I am wondering if we can have our cake and eat it too, without too much
> > >>> added sbt complexity, and without losing 'sbt test' and 'sbt publishLocal'
> > >>> working their magic for us.
> > >>>
> > >>> E.g., what if a simplified layout was:
> > >>>
> > >>> mySchema/schema - takes the place of src/main/*. Also no package-style
> > >>> directory folder structure.
> > >>> mySchema/test - takes the place of src/test/*. No package-style directory
> > >>> folder structure.
> > >>>
> > >>> It would be optional if users want to user mySchema/test/data and
> > >>> mySchema/test/infosets to separate infosets and data, or just put all those
> > >>> files in the same place and use file extensions (.dat vs. .dat.xml vs.
> > >>> .tdml, etc.) to distinguish the kinds of content.
> > >>>
> > >>> Such a flattened tree structure requires that the schema file names are
> > >>> well chosen to be unlikely to conflict with other users chosen names, so a
> > >>> name like common.dfdl.xsd or main.dfdl.xsd would be no good as there is no
> > >>> package directory structure to make them unique.
> > >>>
> > >>> But names like common-mySchema.dfdl.xsd and main-mySchema.dfdl.xsd would
> > >>> still be quite convenient to use, particularly if the mySchema name is well
> > >>> chosen. (Note how I've put the unique part of the name first, so that
> > >>> name-completion will work most easily on command line.)
> > >>>
> > >>> I think this would still work with sbt if we simply override the default
> > >>> paths (and perhaps file patterns) used for specifying source and resources.
> > >>>
> > >>> Thoughts?
> > >>>
> > >>
> >
Re: simplified schema project layout
Posted by Mike Beckerle <mb...@apache.org>.
I will give this a try.
On Wed, Dec 8, 2021 at 10:39 AM Steve Lawrence <sl...@apache.org> wrote:
>
> That's fair, I agree there definitely is some redundancy. In general I'm
> not a huge fan of mixing sources and resources, but maybe it's not too
> big of a deal since in this case since sources for UDF/Layers will be
> rare, and when they do exist there's probably only a very small number
> of them.
>
> I haven't tested this much, but based on some examples and playing
> around a bit, I think this gets you what you're after:
>
> organization := "org.example"
>
> name := "dfdl-fmt"
>
> version := "0.1.0-SNAPSHOT"
>
> lazy val root = (project in file("."))
> .settings(
> Project.inConfig(Compile)(flattenSettings("src")),
> Project.inConfig(Test)(flattenSettings("test")),
> )
>
> def flattenSettings(name: String) = Seq(
> unmanagedSourceDirectories := Seq(baseDirectory.value / name),
> unmanagedResourceDirectories := unmanagedSourceDirectories.value,
> unmanagedSources / includeFilter := "*.java" | "*.scala",
> unmanagedResources / excludeFilter := (unmanagedSources /
> includeFilter).value,
> )
>
> (note that we probably also want many of the existing settings in our
> current build.sbt files)
>
> All the non-test stuff goes in a "src" directory. Sources are anything
> that ends with .java or .scala. Resources are anything that isn't a source.
>
> And the "test" directory has the exact same layout, but for tests.
>
> The .class files that end up in the jar are namespaced by the package line.
>
> The resources that end up in the jar are namespaced by the directory
> structure and/or file naming convention as they are in the src/ or test/
> directory. So schema authors can namespace schemas however they want,
> whether it be directories or file names, or not at all.
>
>
> On 12/8/21 9:56 AM, Mike Beckerle wrote:
> > I guess my concern is that all the depth associated with the sbt-based
> > standard layout feels completely redundant to me.
> >
> > I am suggesting of the src/main/scala, we need only main/. Of
> > src/main/resources/kind we need only main/.
> >
> > E.g, Why are all the typed subdirs needed (xsd/, dfdl/, etc.) when
> > file extensions can be used to distinguish resource types and
> > programming language compilers to be used?
> >
> > To me the only "real" distinction in the standard project layout is
> > main vs. test which is needed to exclude test stuff when packaging.
> >
> > The rest is
> > (a) using directories as "package names" - which can be done with
> > well-chosen longer file names
> > (b) using directories as redundant file typing - which can be done
> > with file name extensions.
> >
> > To me a UDF is a META-INF/services file and some scala/java code in
> > the "main" area.
> > Ditto for a layer definition.
> >
> > I guess concretely I am wondering if there is a way to override basic
> > sbt settings like this:
> >
> > * Instead of src/main/scala, just look for main/*.scala
> > * Instead of src/main/java, just look for main/*.java
> > * Instead of src/main/resources/* just look for main/* where the file
> > name does not end in ".scala" nor ".java"
> >
> > And similarly for test things, where src/test/whatever just becomes
> > test/whatever and distinctions are made using file name extensions.
> >
> > On Wed, Dec 8, 2021 at 9:21 AM Steve Lawrence <sl...@apache.org> wrote:
> >>
> >> What about the scala/java/resources directories? Do those still exist or
> >> are they simplified somehow?
> >>
> >> We currently have an xsd/ directory to allow schematron, xslt, etc to be
> >> included in the same repo. Do we still have that directory?
> >>
> >> How do pluggable UDF's and Layers fit into this? Do we suggest those are
> >> in separate repos, or can they fit into this?
> >>
> >> Note that I believe sbt supports organizations in a single directory
> >> name, e.g.
> >>
> >> src/
> >> └── main/
> >> └── resources/
> >> └── org.foo.myschema/
> >> └── xsd/
> >> └── common.xsd
> >>
> >> So that could be one approach to reduce the deep directory structures.
> >>
> >> Generally, I'm definitely in favor of simplifying the layout, but this
> >> to me feels like it might just add more confusion since it's sort of
> >> close to the existing layout, but not quite the same.
> >>
> >> If we are potentially going to go against the standards, and potentially
> >> make IDE support more difficult, I almost wonder if we should be more
> >> ambitious and come up with something that is completely different? I'm
> >> not sure what that would be, but could be more flat. For example, maybe
> >> something like this:
> >>
> >> dfdl-fmt/
> >> ├── build.sbt
> >> ├── dfdl/
> >> │ ├── format.dfdl.xsd
> >> │ └── main.dfdl.xsd
> >> ├── layer/
> >> │ └── MyLayer.scala
> >> ├── sch/
> >> ├── tdml/
> >> │ └── main.tdml
> >> ├── udf/
> >> │ └── MyUDF.scala
> >> └── xslt/
> >>
> >> A plugin could implicitly add organization structure so things are
> >> namespace when building a jar. Or maybe we even do something like NiFi
> >> has with .nar fles have have a custom package format, e.g. .dar
> >>
> >> It's probably a lot more work, and things to work out (e.g. how to
> >> dependencies work for udf and layers), and almost certainly needs a
> >> plugin to work instead of just tweaking sbt properties, but something
> >> like that feels more ideal to me.
> >>
> >> Note that maybe we don't even use sbt for this. Maybe there's a better
> >> tool for something like this.
> >>
> >> Another thing to consider that is related, with NiFi we found it
> >> difficult to add jars to the NiFi classpath for a specific processor,
> >> which means loading schemas from a jar on the classpath couldn't be
> >> done. Having a custom package format could make this easier, since all
> >> the .dar processing/lookup would be done by Daffodil rather than
> >> standard classpath lookups.
> >>
> >>
> >> On 12/3/21 5:25 PM, Mike Beckerle wrote:
> >>> Experience in giving DFDL training via daffodil is that our standard schema
> >>> project layout <https://daffodil.apache.org/dfdl-layout/> is much too deep
> >>> (directory wise) for many users to conveniently navigate and use. It gets
> >>> in the way of learning.
> >>>
> >>> Our layout was designed to follow sbt conventions that enable automated
> >>> dependency management, packaging, etc. It is easy to use if you are
> >>> accustomed to using an IDE like Eclipse or IntelliJ. It is also
> >>> extraordinarily valuable (and underappreciated) that 'sbt test' does a
> >>> built-in-self-test on a schema, and that 'sbt publishLocal' creates a Jar
> >>> of a DFDL schema for managed dependencies use between schemas.
> >>>
> >>> But new users are mostly coming to DFDL/Daffodil from a command-line prompt
> >>> and a text editor (e.g., VIM).
> >>>
> >>> I am wondering if we can have our cake and eat it too, without too much
> >>> added sbt complexity, and without losing 'sbt test' and 'sbt publishLocal'
> >>> working their magic for us.
> >>>
> >>> E.g., what if a simplified layout was:
> >>>
> >>> mySchema/schema - takes the place of src/main/*. Also no package-style
> >>> directory folder structure.
> >>> mySchema/test - takes the place of src/test/*. No package-style directory
> >>> folder structure.
> >>>
> >>> It would be optional if users want to user mySchema/test/data and
> >>> mySchema/test/infosets to separate infosets and data, or just put all those
> >>> files in the same place and use file extensions (.dat vs. .dat.xml vs.
> >>> .tdml, etc.) to distinguish the kinds of content.
> >>>
> >>> Such a flattened tree structure requires that the schema file names are
> >>> well chosen to be unlikely to conflict with other users chosen names, so a
> >>> name like common.dfdl.xsd or main.dfdl.xsd would be no good as there is no
> >>> package directory structure to make them unique.
> >>>
> >>> But names like common-mySchema.dfdl.xsd and main-mySchema.dfdl.xsd would
> >>> still be quite convenient to use, particularly if the mySchema name is well
> >>> chosen. (Note how I've put the unique part of the name first, so that
> >>> name-completion will work most easily on command line.)
> >>>
> >>> I think this would still work with sbt if we simply override the default
> >>> paths (and perhaps file patterns) used for specifying source and resources.
> >>>
> >>> Thoughts?
> >>>
> >>
>
Re: simplified schema project layout
Posted by Steve Lawrence <sl...@apache.org>.
That's fair, I agree there definitely is some redundancy. In general I'm
not a huge fan of mixing sources and resources, but maybe it's not too
big of a deal since in this case since sources for UDF/Layers will be
rare, and when they do exist there's probably only a very small number
of them.
I haven't tested this much, but based on some examples and playing
around a bit, I think this gets you what you're after:
organization := "org.example"
name := "dfdl-fmt"
version := "0.1.0-SNAPSHOT"
lazy val root = (project in file("."))
.settings(
Project.inConfig(Compile)(flattenSettings("src")),
Project.inConfig(Test)(flattenSettings("test")),
)
def flattenSettings(name: String) = Seq(
unmanagedSourceDirectories := Seq(baseDirectory.value / name),
unmanagedResourceDirectories := unmanagedSourceDirectories.value,
unmanagedSources / includeFilter := "*.java" | "*.scala",
unmanagedResources / excludeFilter := (unmanagedSources /
includeFilter).value,
)
(note that we probably also want many of the existing settings in our
current build.sbt files)
All the non-test stuff goes in a "src" directory. Sources are anything
that ends with .java or .scala. Resources are anything that isn't a source.
And the "test" directory has the exact same layout, but for tests.
The .class files that end up in the jar are namespaced by the package line.
The resources that end up in the jar are namespaced by the directory
structure and/or file naming convention as they are in the src/ or test/
directory. So schema authors can namespace schemas however they want,
whether it be directories or file names, or not at all.
On 12/8/21 9:56 AM, Mike Beckerle wrote:
> I guess my concern is that all the depth associated with the sbt-based
> standard layout feels completely redundant to me.
>
> I am suggesting of the src/main/scala, we need only main/. Of
> src/main/resources/kind we need only main/.
>
> E.g, Why are all the typed subdirs needed (xsd/, dfdl/, etc.) when
> file extensions can be used to distinguish resource types and
> programming language compilers to be used?
>
> To me the only "real" distinction in the standard project layout is
> main vs. test which is needed to exclude test stuff when packaging.
>
> The rest is
> (a) using directories as "package names" - which can be done with
> well-chosen longer file names
> (b) using directories as redundant file typing - which can be done
> with file name extensions.
>
> To me a UDF is a META-INF/services file and some scala/java code in
> the "main" area.
> Ditto for a layer definition.
>
> I guess concretely I am wondering if there is a way to override basic
> sbt settings like this:
>
> * Instead of src/main/scala, just look for main/*.scala
> * Instead of src/main/java, just look for main/*.java
> * Instead of src/main/resources/* just look for main/* where the file
> name does not end in ".scala" nor ".java"
>
> And similarly for test things, where src/test/whatever just becomes
> test/whatever and distinctions are made using file name extensions.
>
> On Wed, Dec 8, 2021 at 9:21 AM Steve Lawrence <sl...@apache.org> wrote:
>>
>> What about the scala/java/resources directories? Do those still exist or
>> are they simplified somehow?
>>
>> We currently have an xsd/ directory to allow schematron, xslt, etc to be
>> included in the same repo. Do we still have that directory?
>>
>> How do pluggable UDF's and Layers fit into this? Do we suggest those are
>> in separate repos, or can they fit into this?
>>
>> Note that I believe sbt supports organizations in a single directory
>> name, e.g.
>>
>> src/
>> └── main/
>> └── resources/
>> └── org.foo.myschema/
>> └── xsd/
>> └── common.xsd
>>
>> So that could be one approach to reduce the deep directory structures.
>>
>> Generally, I'm definitely in favor of simplifying the layout, but this
>> to me feels like it might just add more confusion since it's sort of
>> close to the existing layout, but not quite the same.
>>
>> If we are potentially going to go against the standards, and potentially
>> make IDE support more difficult, I almost wonder if we should be more
>> ambitious and come up with something that is completely different? I'm
>> not sure what that would be, but could be more flat. For example, maybe
>> something like this:
>>
>> dfdl-fmt/
>> ├── build.sbt
>> ├── dfdl/
>> │ ├── format.dfdl.xsd
>> │ └── main.dfdl.xsd
>> ├── layer/
>> │ └── MyLayer.scala
>> ├── sch/
>> ├── tdml/
>> │ └── main.tdml
>> ├── udf/
>> │ └── MyUDF.scala
>> └── xslt/
>>
>> A plugin could implicitly add organization structure so things are
>> namespace when building a jar. Or maybe we even do something like NiFi
>> has with .nar fles have have a custom package format, e.g. .dar
>>
>> It's probably a lot more work, and things to work out (e.g. how to
>> dependencies work for udf and layers), and almost certainly needs a
>> plugin to work instead of just tweaking sbt properties, but something
>> like that feels more ideal to me.
>>
>> Note that maybe we don't even use sbt for this. Maybe there's a better
>> tool for something like this.
>>
>> Another thing to consider that is related, with NiFi we found it
>> difficult to add jars to the NiFi classpath for a specific processor,
>> which means loading schemas from a jar on the classpath couldn't be
>> done. Having a custom package format could make this easier, since all
>> the .dar processing/lookup would be done by Daffodil rather than
>> standard classpath lookups.
>>
>>
>> On 12/3/21 5:25 PM, Mike Beckerle wrote:
>>> Experience in giving DFDL training via daffodil is that our standard schema
>>> project layout <https://daffodil.apache.org/dfdl-layout/> is much too deep
>>> (directory wise) for many users to conveniently navigate and use. It gets
>>> in the way of learning.
>>>
>>> Our layout was designed to follow sbt conventions that enable automated
>>> dependency management, packaging, etc. It is easy to use if you are
>>> accustomed to using an IDE like Eclipse or IntelliJ. It is also
>>> extraordinarily valuable (and underappreciated) that 'sbt test' does a
>>> built-in-self-test on a schema, and that 'sbt publishLocal' creates a Jar
>>> of a DFDL schema for managed dependencies use between schemas.
>>>
>>> But new users are mostly coming to DFDL/Daffodil from a command-line prompt
>>> and a text editor (e.g., VIM).
>>>
>>> I am wondering if we can have our cake and eat it too, without too much
>>> added sbt complexity, and without losing 'sbt test' and 'sbt publishLocal'
>>> working their magic for us.
>>>
>>> E.g., what if a simplified layout was:
>>>
>>> mySchema/schema - takes the place of src/main/*. Also no package-style
>>> directory folder structure.
>>> mySchema/test - takes the place of src/test/*. No package-style directory
>>> folder structure.
>>>
>>> It would be optional if users want to user mySchema/test/data and
>>> mySchema/test/infosets to separate infosets and data, or just put all those
>>> files in the same place and use file extensions (.dat vs. .dat.xml vs.
>>> .tdml, etc.) to distinguish the kinds of content.
>>>
>>> Such a flattened tree structure requires that the schema file names are
>>> well chosen to be unlikely to conflict with other users chosen names, so a
>>> name like common.dfdl.xsd or main.dfdl.xsd would be no good as there is no
>>> package directory structure to make them unique.
>>>
>>> But names like common-mySchema.dfdl.xsd and main-mySchema.dfdl.xsd would
>>> still be quite convenient to use, particularly if the mySchema name is well
>>> chosen. (Note how I've put the unique part of the name first, so that
>>> name-completion will work most easily on command line.)
>>>
>>> I think this would still work with sbt if we simply override the default
>>> paths (and perhaps file patterns) used for specifying source and resources.
>>>
>>> Thoughts?
>>>
>>
Re: simplified schema project layout
Posted by Mike Beckerle <mb...@apache.org>.
I guess my concern is that all the depth associated with the sbt-based
standard layout feels completely redundant to me.
I am suggesting of the src/main/scala, we need only main/. Of
src/main/resources/kind we need only main/.
E.g, Why are all the typed subdirs needed (xsd/, dfdl/, etc.) when
file extensions can be used to distinguish resource types and
programming language compilers to be used?
To me the only "real" distinction in the standard project layout is
main vs. test which is needed to exclude test stuff when packaging.
The rest is
(a) using directories as "package names" - which can be done with
well-chosen longer file names
(b) using directories as redundant file typing - which can be done
with file name extensions.
To me a UDF is a META-INF/services file and some scala/java code in
the "main" area.
Ditto for a layer definition.
I guess concretely I am wondering if there is a way to override basic
sbt settings like this:
* Instead of src/main/scala, just look for main/*.scala
* Instead of src/main/java, just look for main/*.java
* Instead of src/main/resources/* just look for main/* where the file
name does not end in ".scala" nor ".java"
And similarly for test things, where src/test/whatever just becomes
test/whatever and distinctions are made using file name extensions.
On Wed, Dec 8, 2021 at 9:21 AM Steve Lawrence <sl...@apache.org> wrote:
>
> What about the scala/java/resources directories? Do those still exist or
> are they simplified somehow?
>
> We currently have an xsd/ directory to allow schematron, xslt, etc to be
> included in the same repo. Do we still have that directory?
>
> How do pluggable UDF's and Layers fit into this? Do we suggest those are
> in separate repos, or can they fit into this?
>
> Note that I believe sbt supports organizations in a single directory
> name, e.g.
>
> src/
> └── main/
> └── resources/
> └── org.foo.myschema/
> └── xsd/
> └── common.xsd
>
> So that could be one approach to reduce the deep directory structures.
>
> Generally, I'm definitely in favor of simplifying the layout, but this
> to me feels like it might just add more confusion since it's sort of
> close to the existing layout, but not quite the same.
>
> If we are potentially going to go against the standards, and potentially
> make IDE support more difficult, I almost wonder if we should be more
> ambitious and come up with something that is completely different? I'm
> not sure what that would be, but could be more flat. For example, maybe
> something like this:
>
> dfdl-fmt/
> ├── build.sbt
> ├── dfdl/
> │ ├── format.dfdl.xsd
> │ └── main.dfdl.xsd
> ├── layer/
> │ └── MyLayer.scala
> ├── sch/
> ├── tdml/
> │ └── main.tdml
> ├── udf/
> │ └── MyUDF.scala
> └── xslt/
>
> A plugin could implicitly add organization structure so things are
> namespace when building a jar. Or maybe we even do something like NiFi
> has with .nar fles have have a custom package format, e.g. .dar
>
> It's probably a lot more work, and things to work out (e.g. how to
> dependencies work for udf and layers), and almost certainly needs a
> plugin to work instead of just tweaking sbt properties, but something
> like that feels more ideal to me.
>
> Note that maybe we don't even use sbt for this. Maybe there's a better
> tool for something like this.
>
> Another thing to consider that is related, with NiFi we found it
> difficult to add jars to the NiFi classpath for a specific processor,
> which means loading schemas from a jar on the classpath couldn't be
> done. Having a custom package format could make this easier, since all
> the .dar processing/lookup would be done by Daffodil rather than
> standard classpath lookups.
>
>
> On 12/3/21 5:25 PM, Mike Beckerle wrote:
> > Experience in giving DFDL training via daffodil is that our standard schema
> > project layout <https://daffodil.apache.org/dfdl-layout/> is much too deep
> > (directory wise) for many users to conveniently navigate and use. It gets
> > in the way of learning.
> >
> > Our layout was designed to follow sbt conventions that enable automated
> > dependency management, packaging, etc. It is easy to use if you are
> > accustomed to using an IDE like Eclipse or IntelliJ. It is also
> > extraordinarily valuable (and underappreciated) that 'sbt test' does a
> > built-in-self-test on a schema, and that 'sbt publishLocal' creates a Jar
> > of a DFDL schema for managed dependencies use between schemas.
> >
> > But new users are mostly coming to DFDL/Daffodil from a command-line prompt
> > and a text editor (e.g., VIM).
> >
> > I am wondering if we can have our cake and eat it too, without too much
> > added sbt complexity, and without losing 'sbt test' and 'sbt publishLocal'
> > working their magic for us.
> >
> > E.g., what if a simplified layout was:
> >
> > mySchema/schema - takes the place of src/main/*. Also no package-style
> > directory folder structure.
> > mySchema/test - takes the place of src/test/*. No package-style directory
> > folder structure.
> >
> > It would be optional if users want to user mySchema/test/data and
> > mySchema/test/infosets to separate infosets and data, or just put all those
> > files in the same place and use file extensions (.dat vs. .dat.xml vs.
> > .tdml, etc.) to distinguish the kinds of content.
> >
> > Such a flattened tree structure requires that the schema file names are
> > well chosen to be unlikely to conflict with other users chosen names, so a
> > name like common.dfdl.xsd or main.dfdl.xsd would be no good as there is no
> > package directory structure to make them unique.
> >
> > But names like common-mySchema.dfdl.xsd and main-mySchema.dfdl.xsd would
> > still be quite convenient to use, particularly if the mySchema name is well
> > chosen. (Note how I've put the unique part of the name first, so that
> > name-completion will work most easily on command line.)
> >
> > I think this would still work with sbt if we simply override the default
> > paths (and perhaps file patterns) used for specifying source and resources.
> >
> > Thoughts?
> >
>
Re: simplified schema project layout
Posted by Steve Lawrence <sl...@apache.org>.
What about the scala/java/resources directories? Do those still exist or
are they simplified somehow?
We currently have an xsd/ directory to allow schematron, xslt, etc to be
included in the same repo. Do we still have that directory?
How do pluggable UDF's and Layers fit into this? Do we suggest those are
in separate repos, or can they fit into this?
Note that I believe sbt supports organizations in a single directory
name, e.g.
src/
└── main/
└── resources/
└── org.foo.myschema/
└── xsd/
└── common.xsd
So that could be one approach to reduce the deep directory structures.
Generally, I'm definitely in favor of simplifying the layout, but this
to me feels like it might just add more confusion since it's sort of
close to the existing layout, but not quite the same.
If we are potentially going to go against the standards, and potentially
make IDE support more difficult, I almost wonder if we should be more
ambitious and come up with something that is completely different? I'm
not sure what that would be, but could be more flat. For example, maybe
something like this:
dfdl-fmt/
├── build.sbt
├── dfdl/
│ ├── format.dfdl.xsd
│ └── main.dfdl.xsd
├── layer/
│ └── MyLayer.scala
├── sch/
├── tdml/
│ └── main.tdml
├── udf/
│ └── MyUDF.scala
└── xslt/
A plugin could implicitly add organization structure so things are
namespace when building a jar. Or maybe we even do something like NiFi
has with .nar fles have have a custom package format, e.g. .dar
It's probably a lot more work, and things to work out (e.g. how to
dependencies work for udf and layers), and almost certainly needs a
plugin to work instead of just tweaking sbt properties, but something
like that feels more ideal to me.
Note that maybe we don't even use sbt for this. Maybe there's a better
tool for something like this.
Another thing to consider that is related, with NiFi we found it
difficult to add jars to the NiFi classpath for a specific processor,
which means loading schemas from a jar on the classpath couldn't be
done. Having a custom package format could make this easier, since all
the .dar processing/lookup would be done by Daffodil rather than
standard classpath lookups.
On 12/3/21 5:25 PM, Mike Beckerle wrote:
> Experience in giving DFDL training via daffodil is that our standard schema
> project layout <https://daffodil.apache.org/dfdl-layout/> is much too deep
> (directory wise) for many users to conveniently navigate and use. It gets
> in the way of learning.
>
> Our layout was designed to follow sbt conventions that enable automated
> dependency management, packaging, etc. It is easy to use if you are
> accustomed to using an IDE like Eclipse or IntelliJ. It is also
> extraordinarily valuable (and underappreciated) that 'sbt test' does a
> built-in-self-test on a schema, and that 'sbt publishLocal' creates a Jar
> of a DFDL schema for managed dependencies use between schemas.
>
> But new users are mostly coming to DFDL/Daffodil from a command-line prompt
> and a text editor (e.g., VIM).
>
> I am wondering if we can have our cake and eat it too, without too much
> added sbt complexity, and without losing 'sbt test' and 'sbt publishLocal'
> working their magic for us.
>
> E.g., what if a simplified layout was:
>
> mySchema/schema - takes the place of src/main/*. Also no package-style
> directory folder structure.
> mySchema/test - takes the place of src/test/*. No package-style directory
> folder structure.
>
> It would be optional if users want to user mySchema/test/data and
> mySchema/test/infosets to separate infosets and data, or just put all those
> files in the same place and use file extensions (.dat vs. .dat.xml vs.
> .tdml, etc.) to distinguish the kinds of content.
>
> Such a flattened tree structure requires that the schema file names are
> well chosen to be unlikely to conflict with other users chosen names, so a
> name like common.dfdl.xsd or main.dfdl.xsd would be no good as there is no
> package directory structure to make them unique.
>
> But names like common-mySchema.dfdl.xsd and main-mySchema.dfdl.xsd would
> still be quite convenient to use, particularly if the mySchema name is well
> chosen. (Note how I've put the unique part of the name first, so that
> name-completion will work most easily on command line.)
>
> I think this would still work with sbt if we simply override the default
> paths (and perhaps file patterns) used for specifying source and resources.
>
> Thoughts?
>