You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@freemarker.apache.org by Daniel Dekany <da...@gmail.com> on 2021/12/12 12:17:45 UTC

freemarker-generator: OutputGeneratorDefinition and all related

I did not forget about checking out freemarker-generator. So I played
around, have a lot of thoughts. Too many really. So I try to go one topic
at a time.

So this one is about OutputGeneratorDefinition and all related. I believe
this mechanism ended up being too complicated to be practical for users, in
big part because the CLI doesn't show clearly where the boundaries of
OutputGeneratorDefinition are. I learnt to use these without knowing much
beforehand, so I'm speaking from experience, and it can
get quite confusing. So it should be made more accessible.  I'm also not in
favor of having a bias towards template seeded processing, as we have it
now, as it adds asymmetry, which again makes using the tool more confusing
than it could be.

I think non-@Option arguments to the CLI should be the seed files, always,
and all of them. No matter if they are templates or data files, what
matters is that they are seeds. I think that's more intuitive that way;
they are the *what* to process, and the options are about *how* to process
them. So:

   - If you want to seed from template, then instead of
   "freemarker-generator some.csv -t some.ftl", you would write
   "freemarker-generator --data-source=some.csv some.ftl" (or same with "-d
   some.csv"; I would prefer -d over -s).
   - If you want to seed from data files, then it's like
   "freemarker-generator --transformation=some.ftl *.csv" (or same with "-t
   some.ftl", and now "t" stands for "transformation", not "template", as you
   never need "template" in this approach).

Above, --transformation only applies to the seed files specified after it,
and before the next --transformation (if there's any). So that way you can
have different transformation for different files. If no --transformation
was specified yet (or if it was --transformation=#templateOutput), the seed
file will be directly processed as a template. (By the way, then there
could be a --transformation=#copy as well.)

Secondly, we should make scoping more obvious and regular in the CLI. The
simplest I can imagine from user perspective is if the command line
arguments are "executed" left to right (as if we had an imperative
programming language, and each argument is a statement). So if you have
"freemarker-generator --foo=1 --bar=x file1 file2 --foo=2 file3 file4",
where foo and bar are just hypothetical options, then file1 and file2 will
be processed with foo=1 and bar=x, and file3 and file4 will be processed
with foo=2 and bar=x. The --transform examples earlier did this as well.

Now the treatment of seed files can be more unfiorm, we don't we don't need
--data-source-include and --template-include and --template-exclude
--data-source-exclude. We just need --exclude and --include, which, agaian,
applies to every seed file after it, until there's an overriding -exclude
and --include.

We kind of have a similar situation with --output and --output-map, as
these treat template seeds and data file seeds differently. I think here
again we only need --output, but there, it should be possible to specify a
more sophisticated template of the output. Like
--output="${fileNameNoExtension}.txt". If you got the same output file for
multiple seeds, that's an error. (The default would be to write to stdout,
but if multiple seeds write to there, I think that's not intentional and
this should be also an error.)

Also with this we could get rid of "--shared-..." options. Just put the
thing at the beginning of the argument, and it will be seen by all seeds.
Now admittedly we lose functionality here, compared to
OutputGeneratorDefinition-s, as different group of seed files can't have
their own data-model and data-sources now. But, you can give name to
data-model-s and data sources, even groups for data-sources, so every seed
processing can just look at what's interesting for it. Not the cleanest,
but I think doing it better just doesn't worth the baggage it brought.

Last not least, when you need multiple seed file groups, chances are that
it should be just multiple calls to "freemarker-generator". So my main
focus is to make both way of seeding look natural.

-- 
Best regards,
Daniel Dekany

Re: freemarker-generator: OutputGeneratorDefinition and all related

Posted by Daniel Dekany <da...@gmail.com>.

Or, what if you slice up the argument list before passing to picocli?
(Although I probably just wouldn't use it, instead of working around its
limitations.)

On Mon, Jan 3, 2022 at 11:28 AM Daniel Dekany <da...@gmail.com>
wrote:

> I think we should first decide how the CLI will be the easiest to
> understand for users, regardless of current implementation
> details. Actually, most of the code shouldn't assume CLI at all, like many
> need to do basically the same things via Maven. (In Maven you use XML to
> define the job, which is much less limited.) As for the need for multiple
> transformations in a single run, it's surely not needed too often, but
> sometimes shared variables can be quite important (for speed, but even
> maybe for consistency).
>
> So, will picocli limitations decide what the Genertor can do, what it's
> higher level architecture looks like? That doesn't sound like a
> good tradeoff to me. Picocli is quite opinionated, and so not very
> flexible. Probably we need to use something else for something that has a
> complex CLI. (I hoped it can have positionals  in "Repeating Composite
> Argument Groups", but apparently it can't.)
>
> On Sun, Jan 2, 2022 at 10:53 PM Siegfried Goeschl <
> siegfried.goeschl@gmail.com> wrote:
>
>> Hi Daniel,
>>
>> Sorry for the long delay but I was quite busy with some CVEs.
>>
>> I'm not getting all the points of your suggestions, please see my inline
>> comments below
>>
>> Thanks in advance,
>>
>> Siegfried Goeschl
>>
>>
>>
>> > On 12.12.2021, at 13:17, Daniel Dekany <da...@gmail.com> wrote:
>> >
>> > I did not forget about checking out freemarker-generator. So I played
>> > around, have a lot of thoughts. Too many really. So I try to go one
>> topic
>> > at a time.
>> >
>> > So this one is about OutputGeneratorDefinition and all related. I
>> believe
>> > this mechanism ended up being too complicated to be practical for
>> users, in
>> > big part because the CLI doesn't show clearly where the boundaries of
>> > OutputGeneratorDefinition are. I learnt to use these without knowing
>> much
>> > beforehand, so I'm speaking from experience, and it can
>> > get quite confusing. So it should be made more accessible.  I'm also
>> not in
>> > favor of having a bias towards template seeded processing, as we have it
>> > now, as it adds asymmetry, which again makes using the tool more
>> confusing
>> > than it could be.
>> >
>> > I think non-@Option arguments to the CLI should be the seed files,
>> always,
>> > and all of them. No matter if they are templates or data files, what
>> > matters is that they are seeds. I think that's more intuitive that way;
>> > they are the *what* to process, and the options are about *how* to
>> process
>> > them. So:
>> >
>> >   - If you want to seed from template, then instead of
>> >   "freemarker-generator some.csv -t some.ftl", you would write
>> >   "freemarker-generator --data-source=some.csv some.ftl" (or same with
>> "-d
>> >   some.csv"; I would prefer -d over -s).
>> >   - If you want to seed from data files, then it's like
>> >   "freemarker-generator --transformation=some.ftl *.csv" (or same with
>> "-t
>> >   some.ftl", and now "t" stands for "transformation", not "template",
>> as you
>> >   never need "template" in this approach).
>> >
>>
>> [SG] That sounds doable
>>
>> > Above, --transformation only applies to the seed files specified after
>> it,
>> > and before the next --transformation (if there's any). So that way you
>> can
>> > have different transformation for different files. If no
>> --transformation
>> > was specified yet (or if it was --transformation=#templateOutput), the
>> seed
>> > file will be directly processed as a template. (By the way, then there
>> > could be a --transformation=#copy as well.)
>>
>> [SG] When using picocli all positional parameters (you mentioned
>> non-@Options above) go into a single list so there is no good way of
>> mixing multiple seeds with positional parameters (see
>> https://picocli.info/#_mixing_options_and_positional_parameters <
>> https://picocli.info/#_mixing_options_and_positional_parameters>)
>>
>> >
>> > Secondly, we should make scoping more obvious and regular in the CLI.
>> The
>> > simplest I can imagine from user perspective is if the command line
>> > arguments are "executed" left to right (as if we had an imperative
>> > programming language, and each argument is a statement). So if you have
>> > "freemarker-generator --foo=1 --bar=x file1 file2 --foo=2 file3 file4",
>> > where foo and bar are just hypothetical options, then file1 and file2
>> will
>> > be processed with foo=1 and bar=x, and file3 and file4 will be processed
>> > with foo=2 and bar=x. The --transform examples earlier did this as well.
>> >
>>
>> [SG] As mentioned before picocli collects all positional parameters into
>> a single list
>>
>> > Now the treatment of seed files can be more unfiorm, we don't we don't
>> need
>> > --data-source-include and --template-include and --template-exclude
>> > --data-source-exclude. We just need --exclude and --include, which,
>> agaian,
>> > applies to every seed file after it, until there's an overriding
>> -exclude
>> > and --include.
>>
>> [SG] Using a more uniform "include" and "exclude" makes sense
>>
>> >
>> > We kind of have a similar situation with --output and --output-map, as
>> > these treat template seeds and data file seeds differently. I think here
>> > again we only need --output, but there, it should be possible to
>> specify a
>> > more sophisticated template of the output. Like
>> > --output="${fileNameNoExtension}.txt". If you got the same output file
>> for
>> > multiple seeds, that's an error. (The default would be to write to
>> stdout,
>> > but if multiple seeds write to there, I think that's not intentional and
>> > this should be also an error.)
>>
>> [SG] Want to do a more sophisticated output mapping using variable
>> expansion at a later stage
>>
>> [SG] IMHO multiple seeds should be able to write to STDOUT as multiple
>> seeds should be able to write to the same output file.
>>
>> >
>> > Also with this we could get rid of "--shared-..." options. Just put the
>> > thing at the beginning of the argument, and it will be seen by all
>> seeds.
>> > Now admittedly we lose functionality here, compared to
>> > OutputGeneratorDefinition-s, as different group of seed files can't have
>> > their own data-model and data-sources now. But, you can give name to
>> > data-model-s and data sources, even groups for data-sources, so every
>> seed
>> > processing can just look at what's interesting for it. Not the cleanest,
>> > but I think doing it better just doesn't worth the baggage it brought.
>> >
>> > Last not least, when you need multiple seed file groups, chances are
>> that
>> > it should be just multiple calls to "freemarker-generator". So my main
>> > focus is to make both way of seeding look natural.
>> >
>>
>> [SG] A lot of complexity and edge cases are caused by supporting multiple
>> transformations using a single command line invocation
>>
>> * Shared data models
>> * Encodings
>> * Include and excludes
>>
>> What do you think of dropping that?
>>
>> > --
>> > Best regards,
>> > Daniel Dekany
>>
>>
>
> --
> Best regards,
> Daniel Dekany
>


-- 
Best regards,
Daniel Dekany

Re: freemarker-generator: OutputGeneratorDefinition and all related

Posted by Daniel Dekany <da...@gmail.com>.

I think we should first decide how the CLI will be the easiest to
understand for users, regardless of current implementation
details. Actually, most of the code shouldn't assume CLI at all, like many
need to do basically the same things via Maven. (In Maven you use XML to
define the job, which is much less limited.) As for the need for multiple
transformations in a single run, it's surely not needed too often, but
sometimes shared variables can be quite important (for speed, but even
maybe for consistency).

So, will picocli limitations decide what the Genertor can do, what it's
higher level architecture looks like? That doesn't sound like a
good tradeoff to me. Picocli is quite opinionated, and so not very
flexible. Probably we need to use something else for something that has a
complex CLI. (I hoped it can have positionals  in "Repeating Composite
Argument Groups", but apparently it can't.)

On Sun, Jan 2, 2022 at 10:53 PM Siegfried Goeschl <
siegfried.goeschl@gmail.com> wrote:

> Hi Daniel,
>
> Sorry for the long delay but I was quite busy with some CVEs.
>
> I'm not getting all the points of your suggestions, please see my inline
> comments below
>
> Thanks in advance,
>
> Siegfried Goeschl
>
>
>
> > On 12.12.2021, at 13:17, Daniel Dekany <da...@gmail.com> wrote:
> >
> > I did not forget about checking out freemarker-generator. So I played
> > around, have a lot of thoughts. Too many really. So I try to go one topic
> > at a time.
> >
> > So this one is about OutputGeneratorDefinition and all related. I believe
> > this mechanism ended up being too complicated to be practical for users,
> in
> > big part because the CLI doesn't show clearly where the boundaries of
> > OutputGeneratorDefinition are. I learnt to use these without knowing much
> > beforehand, so I'm speaking from experience, and it can
> > get quite confusing. So it should be made more accessible.  I'm also not
> in
> > favor of having a bias towards template seeded processing, as we have it
> > now, as it adds asymmetry, which again makes using the tool more
> confusing
> > than it could be.
> >
> > I think non-@Option arguments to the CLI should be the seed files,
> always,
> > and all of them. No matter if they are templates or data files, what
> > matters is that they are seeds. I think that's more intuitive that way;
> > they are the *what* to process, and the options are about *how* to
> process
> > them. So:
> >
> >   - If you want to seed from template, then instead of
> >   "freemarker-generator some.csv -t some.ftl", you would write
> >   "freemarker-generator --data-source=some.csv some.ftl" (or same with
> "-d
> >   some.csv"; I would prefer -d over -s).
> >   - If you want to seed from data files, then it's like
> >   "freemarker-generator --transformation=some.ftl *.csv" (or same with
> "-t
> >   some.ftl", and now "t" stands for "transformation", not "template", as
> you
> >   never need "template" in this approach).
> >
>
> [SG] That sounds doable
>
> > Above, --transformation only applies to the seed files specified after
> it,
> > and before the next --transformation (if there's any). So that way you
> can
> > have different transformation for different files. If no --transformation
> > was specified yet (or if it was --transformation=#templateOutput), the
> seed
> > file will be directly processed as a template. (By the way, then there
> > could be a --transformation=#copy as well.)
>
> [SG] When using picocli all positional parameters (you mentioned
> non-@Options above) go into a single list so there is no good way of
> mixing multiple seeds with positional parameters (see
> https://picocli.info/#_mixing_options_and_positional_parameters <
> https://picocli.info/#_mixing_options_and_positional_parameters>)
>
> >
> > Secondly, we should make scoping more obvious and regular in the CLI. The
> > simplest I can imagine from user perspective is if the command line
> > arguments are "executed" left to right (as if we had an imperative
> > programming language, and each argument is a statement). So if you have
> > "freemarker-generator --foo=1 --bar=x file1 file2 --foo=2 file3 file4",
> > where foo and bar are just hypothetical options, then file1 and file2
> will
> > be processed with foo=1 and bar=x, and file3 and file4 will be processed
> > with foo=2 and bar=x. The --transform examples earlier did this as well.
> >
>
> [SG] As mentioned before picocli collects all positional parameters into a
> single list
>
> > Now the treatment of seed files can be more unfiorm, we don't we don't
> need
> > --data-source-include and --template-include and --template-exclude
> > --data-source-exclude. We just need --exclude and --include, which,
> agaian,
> > applies to every seed file after it, until there's an overriding -exclude
> > and --include.
>
> [SG] Using a more uniform "include" and "exclude" makes sense
>
> >
> > We kind of have a similar situation with --output and --output-map, as
> > these treat template seeds and data file seeds differently. I think here
> > again we only need --output, but there, it should be possible to specify
> a
> > more sophisticated template of the output. Like
> > --output="${fileNameNoExtension}.txt". If you got the same output file
> for
> > multiple seeds, that's an error. (The default would be to write to
> stdout,
> > but if multiple seeds write to there, I think that's not intentional and
> > this should be also an error.)
>
> [SG] Want to do a more sophisticated output mapping using variable
> expansion at a later stage
>
> [SG] IMHO multiple seeds should be able to write to STDOUT as multiple
> seeds should be able to write to the same output file.
>
> >
> > Also with this we could get rid of "--shared-..." options. Just put the
> > thing at the beginning of the argument, and it will be seen by all seeds.
> > Now admittedly we lose functionality here, compared to
> > OutputGeneratorDefinition-s, as different group of seed files can't have
> > their own data-model and data-sources now. But, you can give name to
> > data-model-s and data sources, even groups for data-sources, so every
> seed
> > processing can just look at what's interesting for it. Not the cleanest,
> > but I think doing it better just doesn't worth the baggage it brought.
> >
> > Last not least, when you need multiple seed file groups, chances are that
> > it should be just multiple calls to "freemarker-generator". So my main
> > focus is to make both way of seeding look natural.
> >
>
> [SG] A lot of complexity and edge cases are caused by supporting multiple
> transformations using a single command line invocation
>
> * Shared data models
> * Encodings
> * Include and excludes
>
> What do you think of dropping that?
>
> > --
> > Best regards,
> > Daniel Dekany
>
>

-- 
Best regards,
Daniel Dekany

Re: freemarker-generator: OutputGeneratorDefinition and all related

Posted by Siegfried Goeschl <si...@gmail.com>.

Hi Daniel,

Sorry for the long delay but I was quite busy with some CVEs.

I'm not getting all the points of your suggestions, please see my inline comments below

Thanks in advance, 

Siegfried Goeschl



> On 12.12.2021, at 13:17, Daniel Dekany <da...@gmail.com> wrote:
> 
> I did not forget about checking out freemarker-generator. So I played
> around, have a lot of thoughts. Too many really. So I try to go one topic
> at a time.
> 
> So this one is about OutputGeneratorDefinition and all related. I believe
> this mechanism ended up being too complicated to be practical for users, in
> big part because the CLI doesn't show clearly where the boundaries of
> OutputGeneratorDefinition are. I learnt to use these without knowing much
> beforehand, so I'm speaking from experience, and it can
> get quite confusing. So it should be made more accessible.  I'm also not in
> favor of having a bias towards template seeded processing, as we have it
> now, as it adds asymmetry, which again makes using the tool more confusing
> than it could be.
> 
> I think non-@Option arguments to the CLI should be the seed files, always,
> and all of them. No matter if they are templates or data files, what
> matters is that they are seeds. I think that's more intuitive that way;
> they are the *what* to process, and the options are about *how* to process
> them. So:
> 
>   - If you want to seed from template, then instead of
>   "freemarker-generator some.csv -t some.ftl", you would write
>   "freemarker-generator --data-source=some.csv some.ftl" (or same with "-d
>   some.csv"; I would prefer -d over -s).
>   - If you want to seed from data files, then it's like
>   "freemarker-generator --transformation=some.ftl *.csv" (or same with "-t
>   some.ftl", and now "t" stands for "transformation", not "template", as you
>   never need "template" in this approach).
> 

[SG] That sounds doable

> Above, --transformation only applies to the seed files specified after it,
> and before the next --transformation (if there's any). So that way you can
> have different transformation for different files. If no --transformation
> was specified yet (or if it was --transformation=#templateOutput), the seed
> file will be directly processed as a template. (By the way, then there
> could be a --transformation=#copy as well.)

[SG] When using picocli all positional parameters (you mentioned non-@Options above) go into a single list so there is no good way of mixing multiple seeds with positional parameters (see https://picocli.info/#_mixing_options_and_positional_parameters <https://picocli.info/#_mixing_options_and_positional_parameters>)

> 
> Secondly, we should make scoping more obvious and regular in the CLI. The
> simplest I can imagine from user perspective is if the command line
> arguments are "executed" left to right (as if we had an imperative
> programming language, and each argument is a statement). So if you have
> "freemarker-generator --foo=1 --bar=x file1 file2 --foo=2 file3 file4",
> where foo and bar are just hypothetical options, then file1 and file2 will
> be processed with foo=1 and bar=x, and file3 and file4 will be processed
> with foo=2 and bar=x. The --transform examples earlier did this as well.
> 

[SG] As mentioned before picocli collects all positional parameters into a single list

> Now the treatment of seed files can be more unfiorm, we don't we don't need
> --data-source-include and --template-include and --template-exclude
> --data-source-exclude. We just need --exclude and --include, which, agaian,
> applies to every seed file after it, until there's an overriding -exclude
> and --include.

[SG] Using a more uniform "include" and "exclude" makes sense 

> 
> We kind of have a similar situation with --output and --output-map, as
> these treat template seeds and data file seeds differently. I think here
> again we only need --output, but there, it should be possible to specify a
> more sophisticated template of the output. Like
> --output="${fileNameNoExtension}.txt". If you got the same output file for
> multiple seeds, that's an error. (The default would be to write to stdout,
> but if multiple seeds write to there, I think that's not intentional and
> this should be also an error.)

[SG] Want to do a more sophisticated output mapping using variable expansion at a later stage

[SG] IMHO multiple seeds should be able to write to STDOUT as multiple seeds should be able to write to the same output file.

> 
> Also with this we could get rid of "--shared-..." options. Just put the
> thing at the beginning of the argument, and it will be seen by all seeds.
> Now admittedly we lose functionality here, compared to
> OutputGeneratorDefinition-s, as different group of seed files can't have
> their own data-model and data-sources now. But, you can give name to
> data-model-s and data sources, even groups for data-sources, so every seed
> processing can just look at what's interesting for it. Not the cleanest,
> but I think doing it better just doesn't worth the baggage it brought.
> 
> Last not least, when you need multiple seed file groups, chances are that
> it should be just multiple calls to "freemarker-generator". So my main
> focus is to make both way of seeding look natural.
> 

[SG] A lot of complexity and edge cases are caused by supporting multiple transformations using a single command line invocation

* Shared data models
* Encodings
* Include and excludes

What do you think of dropping that?

> -- 
> Best regards,
> Daniel Dekany