You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@freemarker.apache.org by Daniel Dekany <dd...@apache.org> on 2020/02/23 15:37:18 UTC

freemarker-generator: Improving the input documents concept

Input documents is a fundamental concept in freemarker-generator, so we
should think about that more, and probably refine/rework how it's done.

Currently it works like this, with CLI at least.

    freemarker-cli
        -t access-report.ftl
        somewhere/foo-access-log.csv

Then in access-report.ftl you have to do something like this:

    <#assign doc = Documents.get(0)>
    ... process doc here

(The more idiomatic Documents[0] won't work. Actually, that lead to a funny
chain of coincidences: It returned the string "D", then CSVTool.parse(...)
happily parsed that to a table with the single column "D", and 0 rows, and
as there were 0 rows, the template didn't run into an error because
row.myExpectedColumn refers to a missing column either, so the process
finished with success. (: Pretty unlucky for sure. The root was
unintentionally breaking a FreeMarker idiom though; eventually we will have
to work on those too, but, different topic.)

However, actually multiple input documents can be passed in:

    freemarker-cli
        -t access-report.ftl
        somewhere/foo-access-log.csv
        somewhere/bar-access-log.csv

Above template will still work, though then you ignored all but the first
document. So if you expect any number of input documents, you probably will
have to do this:

    <#list Documents.list as doc>
          ... process doc here
    </#list>

(The more idiomatic <#list Documents as doc> won't work; but again, those
we will work out in a different thread.)


So, what would be better, in my opinion. I start out from what I think are
the common uses cases, in decreasing order of frequency. Goal is to make
those less error prone for the users, and simpler to express.

USE CASE 1

You have exactly 1 input documents, which is therefore simply "the"
document in the mind of the user. This is probably the typical use case,
but at least the use case users typically start out from when starting the
work.

    freemarker-cli
        -t access-report.ftl
        somewhere/foo-access-log.csv

Then `Documents.get(0)` is not very fitting. Most importantly it's error
prone, because if the user passed in more than 1 documents (can even happen
totally accidentally, like if the user was lazy and used a wildcard that
the shell exploded), the template will silently ignore the rest of the
documents, and the singe document processed will be practically picked
randomly. The user might won't notice that and submits a bad report or such.

I think that in this use case the document should be simply referred as
`Document` in the template. When you have multiple documents there,
referring to `Document` should be an error, saying that the template was
made to process a single document only.


USE CASE 2

You have multiple input documents, but each has different role (different
schema, maybe different file type). Like, you pass in users.csv and
groups.csv. Each has difference schema, and so you want to access them
differently, but in the same template.

    freemarker-cli
        [...]
        --named-document users somewhere/foo-users.csv
        --named-document groups somewhere/foo-groups.csv

Then in the template you could refer to them as: `NamedDocuments.users`,
and `NamedDocuments.groups`.

Use Case 1, and 2 can be unified into a coherent concept, where `Document`
is just a shorthand for `NamedDocuments.main`. It's called "main" because
that's "the" document the template is about, but then you have to added
some helper documents, with symbolic names representing their role.

    freemarker-cli
        -t access-report.ftl
        --document-name=main somewhere/foo-access-log.csv
        --document-name=users somewhere/foo-users.csv
        --document-name=groups somewhere/foo-groups.csv

Here, `Document` still works in the template, and it refers to
`somewhere/foo-access-log.csv`. (While omitting --document-name=main above
would be cleaner, I couldn't figure out how to do that with Picocli.
Anyway, for now the point is the concept, which is not specific to CLI.)


USE CASE 3

Here you have several of the same kind of documents. That has a more
generic sub-use-case, when you have explicitly named documents (like
"users" above), and for some you expect multiple input files.

    freemarker-cli
        -t access-report.ftl
        --document-name=main somewhere/foo-access-log.csv
somewhere/bar-access-log.csv
        --document-name=users somewhere/foo-users.csv
somewhere/bar-users.csv
        --document-name=groups somewhere/global-groups.csv

The template must to be written with this use case in mind, as now it has
#list some of the documents. (I think in practice you hardly ever want to
get a document by hard coded index. Either you don't know how many
documents you have, so you can't use hard coded indexes, or you do, and
each index has a specific meaning, but then you should name the documents
instead, as using indexes is error prone, and hard to read.)
Accessing that list of documents in the template, maybe could be done like
this:
- For the "main" documents: `DocumentList`
- For explicitly named documents, like "users": `NamedDocumentLists.users`


SUMMING UP

To unify all 3 use cases into a coherent concept:
- `NamedDocumentLists.<name>` is the most generic form, and while you can
achieve everything with it, using it requires your template to handle the
most generic case too. So, I think it would be rarely used.
- `DocumentList` is just a shorthand for `NamedDocumentLists.main`. It's
used if you only have one kind of documents (single format and schema), but
potentially multiple of them.
- `NamedDocuments.<name>` expresses that you expect exactly 1 document of
the given name.
- `Document` is just a shorthand for `NamedDocuments.main`. This is for the
most natural/frequent use case.

That's 4 possible ways of accessing your documents, which is a trade-off
for the sake of these:
- Catching CLI (or Maven, etc.) input where the template output likely will
be wrong. That's only possible if the user can communicate its intent in
the template.
- Users don't need to deal with concepts that are irrelevant in their
concrete use case. Just start with the trivial, `Document`, and later if
the need arises, generalize to named documents, document lists, or both.


What do guys think?

Re: freemarker-generator: Improving the input documents concept

Posted by Siegfried Goeschl <si...@gmail.com>.

As discussed before the name is widely used :-)

> On 29.02.2020, at 18:05, Siegfried Goeschl <si...@gmail.com> wrote:
> 
> Well, clashes with the "java.activation.DataSource" - can do & not definite opinion about it :)
> 
>> On 29.02.2020, at 18:03, Daniel Dekany <da...@gmail.com> wrote:
>> 
>> I believe that should be DataSource (with capital S), as it's two words.
>> 
>> Also, it's the name of a too widely used and known JDBC interface. So if
>> anyone can tell a similarly descriptive alternative...
>> 
>> On Sat, Feb 29, 2020 at 5:03 PM Siegfried Goeschl <
>> siegfried.goeschl@gmail.com> wrote:
>> 
>>> Hi Daniel,
>>> 
>>> I'm an enterprise developer - bad habits die hard :-)
>>> 
>>> So I closed the following tickets and merged the branches
>>> 
>>> 1) FREEMARKER-129 freemarker-generator: Merge "freemarker-cli" into
>>> "freemarker-generator"
>>> 2) FREEMARKER-134 freemarker-generator: Rename "Document" to "Datasource"
>>> 3) FREEMARKER-135 freemarker-generator-cli: Support user-supplied names
>>> for datasources
>>> 
>>> Thanks in advance,
>>> 
>>> Siegfried Goeschl
>>> 
>>> 
>>>> On 29.02.2020, at 12:19, Daniel Dekany <da...@gmail.com> wrote:
>>>> 
>>>> Yeah, and of course, you can merge that branch. You can even work on the
>>>> master directly after all.
>>>> 
>>>> On Sat, Feb 29, 2020 at 12:17 PM Daniel Dekany <da...@gmail.com>
>>>> wrote:
>>>> 
>>>>> But, I do recognize the cattle use case (several "faceless" files with
>>>>> common format/schema). Only, my idea is to push that complexity on the
>>> data
>>>>> source. The "data source" concept shields the rest of the application
>>> from
>>>>> the details of how the data is stored or retrieved. So, a data source
>>> might
>>>>> loads a bunch of log files from a directory, and present them as a
>>> single
>>>>> big table, or like a list of tables, etc. So I want to deal with the
>>> cattle
>>>>> use case, but the question is what part of the of architecture will deal
>>>>> with this complication, with other words, how do you box things. Why my
>>>>> initial bet is to stuff that complication into the "data source"
>>>>> implementation(s) is that data sources are inherently varied. Some
>>> returns
>>>>> a table-like thing, some have multiple named tables (worksheets in
>>> Excel),
>>>>> some returns tree of nodes (XML), etc. So then, some might returns a
>>>>> list-of-list-of log records, or just a single list of log-records (put
>>>>> together from daily log files). That way cattles don't add to conceptual
>>>>> complexity. Now, you might be aware of cases where the cattle concept
>>> must
>>>>> be more exposed than this, and the we can't box things like this. But
>>> this
>>>>> is what I tried to express.
>>>>> 
>>>>> Regarding "output generators", and how that applies on the command
>>> line. I
>>>>> think it's important that the common core between Maven and
>>> command-line is
>>>>> as fat as possible. Ideally, they are just two syntax to set up the same
>>>>> thing. Mostly at least. So, if you specify a template file to the CLI
>>>>> application, in a way so that it causes it to process that template to
>>>>> generate a single output, then there you have just defined an "output
>>>>> generator" (even if it wasn't explicitly called like that in the command
>>>>> line). If you specify 3 csv files to the CLI application, in a way so
>>> that
>>>>> it causes it to generate 3 output files, then you have just defined 3
>>>>> "output generators" there (there's at least one template specified there
>>>>> too, but that wasn't an "output generator" itself, it was just an
>>> attribute
>>>>> of the 3 output generators). If you specify 1 template, and 3 csv
>>> files, in
>>>>> a way so that it will yield 4 output files (1 for the template, 3 for
>>> the
>>>>> csv-s), then you have defined 4 output generators there. If you have a
>>> data
>>>>> source that loads a list of 3 entities (say, 3 csv files, so it's a
>>> list of
>>>>> tables then), and you have 2 templates, and you tell the CLI to execute
>>>>> each template for each item in said data source, then you have just
>>> defined
>>>>> 6 "output generators".
>>>>> 
>>>>> On Fri, Feb 28, 2020 at 11:08 AM Siegfried Goeschl <
>>>>> siegfried.goeschl@gmail.com> wrote:
>>>>> 
>>>>>> Hi Daniel,
>>>>>> 
>>>>>> That all depends on your mental model and work you do, expectations,
>>>>>> experience :-)
>>>>>> 
>>>>>> 
>>>>>> __Document Handling__
>>>>>> 
>>>>>> *"But I think actually we have no good use case for list of documents
>>>>>> that's passed at once to a single template run, so, we can just ignore
>>>>>> that complication"*
>>>>>> 
>>>>>> In my case that's not a complication but my daily business - I'm
>>>>>> regularly wading through access logs - yesterday probably a couple of
>>>>>> hundreds access logs across two staging sites to help tracking some
>>>>>> strange API gateway issues :-)
>>>>>> 
>>>>>> My gut feeling is (borrowing from
>>>>>> 
>>>>>> 
>>> https://medium.com/@Joachim8675309/devops-concepts-pets-vs-cattle-2380b5aab313
>>>>>> )
>>>>>> 
>>>>>> 1. You have a few lovely named documents / templates - `pets`
>>>>>> 2. You have tons of anonymous documents / templates to process -
>>>>>> `cattle`
>>>>>> 3. The "grey area" comes into play when mixing `pets & cattle`
>>>>>> 
>>>>>> `freemarker-cli` was built with 2) in mind and I want to cover 1) since
>>>>>> it is equally important and common.
>>>>>> 
>>>>>> 
>>>>>> __Template And Document Processing Modes__
>>>>>> 
>>>>>> IMHO it is important to answer the following question : "How many
>>>>>> outputs do you get when rendering 2 template and 3 datasources? Two,
>>>>>> Three or Six?"
>>>>>> 
>>>>>> Your answer is influenced by your mental model / experience
>>>>>> 
>>>>>> * When wading through tons of CSV files, access logs, etc. the answer
>>> is
>>>>>> "2"
>>>>>> * When doing source code generation the obvious answer is "6"
>>>>>> * Can't image a use case which results in "3" but I'm pretty sure we
>>>>>> will encounter one
>>>>>> 
>>>>>> __Template and document mode probably shouldn't exist__
>>>>>> 
>>>>>> That's hard for me to fully understand - I definitely lack your
>>> insights
>>>>>> & experience writing such tools :-)
>>>>>> 
>>>>>> Defining the `Output Generator` is the underlying model for the Maven
>>>>>> plugin (and probably FMPP).
>>>>>> 
>>>>>> I'm not sure if this applies for command lines at least not in the way
>>> I
>>>>>> use them (or would like to use them)
>>>>>> 
>>>>>> 
>>>>>> Thanks in advance,
>>>>>> 
>>>>>> Siegfried Goeschl
>>>>>> 
>>>>>> PS: Can/shall I merge the PR to bring in `freemarker-cli`?
>>>>>> 
>>>>>> 
>>>>>> On 28 Feb 2020, at 9:14, Daniel Dekany wrote:
>>>>>> 
>>>>>>> Yeah, "data source" is surely a too popular name, but for reason.
>>>>>>> Anyone
>>>>>>> has other ideas?
>>>>>>> 
>>>>>>> As of naming data sources and such. One thing I was wondering about
>>>>>>> back
>>>>>>> then is how to deal with list of documents given to a template, versus
>>>>>>> exactly 1 document given to a template. But I think actually we have
>>>>>>> no
>>>>>>> good use case for list of documents that's passed at once to a single
>>>>>>> template run, so, we can just ignore that complication. A document has
>>>>>>> a
>>>>>>> name, and that's always just a single document, not a collection, as
>>>>>>> far as
>>>>>>> the template is concerned. (We can have multiple documents per run,
>>>>>>> but
>>>>>>> those normally yield separate output generators, so it's still only
>>>>>>> one
>>>>>>> document per template.) However, we can have data source types
>>>>>>> (document
>>>>>>> types with old terminology) that collect together multiple data files.
>>>>>>> So
>>>>>>> then that complexity is encapsulated into the data source type, and
>>>>>>> doesn't
>>>>>>> complicate the overall architecture. That's another case when a data
>>>>>>> source
>>>>>>> is not just a file. Like maybe there's a data source type that loads
>>>>>>> all
>>>>>>> the CSV-s from a directory, into a single big table (I had such case),
>>>>>>> or
>>>>>>> even into a list of tables. Or, as I mentioned already, a data source
>>>>>>> is
>>>>>>> maybe an SQL query on a JDBC data source (and we got the first term
>>>>>>> clash... JDBC also call them data sources).
>>>>>>> 
>>>>>>> Template and document mode probably shouldn't exist from user
>>>>>>> perspective
>>>>>>> either, at least not as a global option that must apply to everything
>>>>>>> in a
>>>>>>> run. They could just give the files that define the "output
>>>>>>> generators",
>>>>>>> and some of them will be templates, some of them are data files, in
>>>>>>> which
>>>>>>> case a template need to be associated with them (and there can be a
>>>>>>> couple
>>>>>>> of ways of doing that). And then again, there are the cases where you
>>>>>>> want
>>>>>>> to create one output generator per entity from some data source.
>>>>>>> 
>>>>>>> On Fri, Feb 28, 2020 at 8:23 AM Siegfried Goeschl <
>>>>>>> siegfried.goeschl@gmail.com> wrote:
>>>>>>> 
>>>>>>>> Hi Daniel,
>>>>>>>> 
>>>>>>>> See my comments below - and thanks for your patience and input :-)
>>>>>>>> 
>>>>>>>> *Renaming Document To DataSource*
>>>>>>>> 
>>>>>>>> Yes, makes sense. I tried to avoid since I'm using javax.activation
>>>>>>>> and
>>>>>>>> its DataSource.
>>>>>>>> 
>>>>>>>> *Template And Document Mode*
>>>>>>>> 
>>>>>>>> Agreed - I think it is a valuable abstraction for the user but it is
>>>>>>>> not
>>>>>>>> an implementation concept :-)
>>>>>>>> 
>>>>>>>> *Document Without Symbolic Names*
>>>>>>>> 
>>>>>>>> Also agreed and it is going to change but I have not settled my mind
>>>>>>>> yet
>>>>>>>> what exactly to implement.
>>>>>>>> 
>>>>>>>> Thanks in advance,
>>>>>>>> 
>>>>>>>> Siegfried Goeschl
>>>>>>>> 
>>>>>>>> On 28 Feb 2020, at 1:05, Daniel Dekany wrote:
>>>>>>>> 
>>>>>>>> A few quick thoughts on that:
>>>>>>>> 
>>>>>>>> - We should replace the "document" term with something more speaking.
>>>>>>>> It
>>>>>>>> doesn't tell that it's some kind of input. Also, most of these inputs
>>>>>>>> aren't something that people typically call documents. Like a csv
>>>>>>>> file, or
>>>>>>>> a database table, which is not even a file (OK we don't support such
>>>>>>>> thing
>>>>>>>> at the moment). I think, maybe "data source" is a safe enough term.
>>>>>>>> (It
>>>>>>>> also rhymes with data model.)
>>>>>>>> - You have separate "template" and "document" "mode", that applies to
>>>>>>>> a
>>>>>>>> whole run. I think such specialization won't be helpful. We could
>>>>>>>> just say,
>>>>>>>> on the conceptual level at lest, that we need a set of "outputs
>>>>>>>> generators". An output generator is an object (in the API) that
>>>>>>>> specifies a
>>>>>>>> template, a data-model (where the data-model is possibly populated
>>>>>>>> with
>>>>>>>> "documents"), and an output "sink" (a file path, or stdout), and can
>>>>>>>> generate the output itself. A practical way of defining the output
>>>>>>>> generators in a CLI application is via a bunch of files, each
>>>>>>>> defining an
>>>>>>>> output generator. Some of those files is maybe a template (that you
>>>>>>>> can
>>>>>>>> even detect from the file extension), or a data file that we
>>>>>>>> currently call
>>>>>>>> a "document". They could freely mix inside the same run. I have also
>>>>>>>> met
>>>>>>>> use case when you have a single table (single "document"), and each
>>>>>>>> record
>>>>>>>> in it yields an output file. That can also be described in some file
>>>>>>>> format, or really in any other way, like directly in command line
>>>>>>>> argument,
>>>>>>>> via API, etc.
>>>>>>>> - You have multiple documents without associated symbolical name in
>>>>>>>> some
>>>>>>>> examples. Templates can't identify those then in a well maintainable
>>>>>>>> way.
>>>>>>>> The actual file name is often not a good identifier, can change over
>>>>>>>> time,
>>>>>>>> and you might don't even have good control over it, like you already
>>>>>>>> receive it as a parameter from somewhere else, or someone
>>>>>>>> moves/renames
>>>>>>>> that files that you need to read. Index is also not very good, but I
>>>>>>>> have
>>>>>>>> written about that earlier.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, Feb 26, 2020 at 9:33 AM Siegfried Goeschl <
>>>>>>>> siegfried.goeschl@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> Hi folks,
>>>>>>>> 
>>>>>>>> still wrapping my side around but assembled some thoughts here -
>>>>>>>> https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449
>>>>>>>> 
>>>>>>>> Thanks in advance,
>>>>>>>> 
>>>>>>>> Siegfried Goeschl
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 23 Feb 2020, at 23:14, Daniel Dekany <dd...@apache.org> wrote:
>>>>>>>> 
>>>>>>>> What you are describing is more like the angle that FMPP took
>>>>>>>> initially,
>>>>>>>> where templates drive things, they generate the output for themselves
>>>>>>>> 
>>>>>>>> (even
>>>>>>>> 
>>>>>>>> multiple output files if they wish). By default output files name
>>>>>>>> (and
>>>>>>>> relative path) is deduced from template name. There was also a global
>>>>>>>> data-model, built in a configuration file (or equally, built via
>>>>>>>> command
>>>>>>>> line arguments, or both mixed), from which templates get whatever
>>>>>>>> data
>>>>>>>> 
>>>>>>>> they
>>>>>>>> 
>>>>>>>> are interested in. Take a look at the figures here:
>>>>>>>> http://fmpp.sourceforge.net/qtour.html. Later, this concept was
>>>>>>>> 
>>>>>>>> generalized
>>>>>>>> 
>>>>>>>> a bit more, because you could add XML files at the same place where
>>>>>>>> you
>>>>>>>> have the templates, and then you could associate transform templates
>>>>>>>> to
>>>>>>>> 
>>>>>>>> the
>>>>>>>> 
>>>>>>>> XML files (based on path pattern and/or the XML document element).
>>>>>>>> Now
>>>>>>>> that's like what freemarker-generator had initially (data files drive
>>>>>>>> output, and the template is there to transform it).
>>>>>>>> 
>>>>>>>> So I think the generic mental model would like this:
>>>>>>>> 
>>>>>>>> 1. You got files that drive the process, let's call them *generator
>>>>>>>> files* for now. Usually, each generator file yields an output file
>>>>>>>> (but
>>>>>>>> maybe even multiple output files, as you might saw in the last
>>>>>>>> figure).
>>>>>>>> These generator files can be of many types, like XML, JSON, XLSX (as
>>>>>>>> 
>>>>>>>> in the
>>>>>>>> 
>>>>>>>> original freemarker-generator), and even templates (as is the norm in
>>>>>>>> FMPP). If the file is not a template, then you got a set of
>>>>>>>> transformer
>>>>>>>> templates (-t CLI option) in a separate directory, which can be
>>>>>>>> 
>>>>>>>> associated
>>>>>>>> 
>>>>>>>> with the generator files base on name patterns, and even based on
>>>>>>>> 
>>>>>>>> content
>>>>>>>> 
>>>>>>>> (schema usually). If the generator file is a template (so that's a
>>>>>>>> positional @Parameter CLI argument that happens to be an *.ftl, and
>>>>>>>> is
>>>>>>>> 
>>>>>>>> not
>>>>>>>> 
>>>>>>>> a template file specified after the "-t" option), then you just
>>>>>>>> Template.process(...) it, and it prints what the output will be.
>>>>>>>> 2. You also have a set of variables, the global data-model, that
>>>>>>>> contains commonly useful stuff, like what you now call parameters
>>>>>>>> (CLI
>>>>>>>> -Pname=value), but also maybe data loaded from JSON, XML, etc.. Those
>>>>>>>> 
>>>>>>>> data
>>>>>>>> 
>>>>>>>> files aren't "generator files". Templates just use them if they need
>>>>>>>> 
>>>>>>>> them.
>>>>>>>> 
>>>>>>>> An important thing here is to reuse the same mechanism to read and
>>>>>>>> 
>>>>>>>> parse
>>>>>>>> 
>>>>>>>> those data files, which was used in templates when transforming
>>>>>>>> 
>>>>>>>> generator
>>>>>>>> 
>>>>>>>> files. So we need a common format for specifying how to load data
>>>>>>>> 
>>>>>>>> files.
>>>>>>>> 
>>>>>>>> That's maybe just FTL that #assigns to the variables, or maybe more
>>>>>>>> declarative format.
>>>>>>>> 
>>>>>>>> What I have described in the original post here was a less generic
>>>>>>>> form
>>>>>>>> 
>>>>>>>> of
>>>>>>>> 
>>>>>>>> this, as I tried to be true with the original approach. I though the
>>>>>>>> proposal will be drastic enough as it is... :) There, the "main"
>>>>>>>> document
>>>>>>>> is the "generator file" from point 1, the "-t" template is the
>>>>>>>> transform
>>>>>>>> template for the "main" document, and the other named documents
>>>>>>>> ("users",
>>>>>>>> "groups") is a poor man's shared data-model from point 2 (together
>>>>>>>> with
>>>>>>>> with -PName=value).
>>>>>>>> 
>>>>>>>> There's further somewhat confusing thing to get right with the
>>>>>>>> list-of-documents (`DocuentList`, `NamedDocumentLists`) thing though.
>>>>>>>> In
>>>>>>>> the model above, as per point 1, if you list multiple data files,
>>>>>>>> each
>>>>>>>> 
>>>>>>>> will
>>>>>>>> 
>>>>>>>> generate a separate output file. So, if you need take in a list of
>>>>>>>> files
>>>>>>>> 
>>>>>>>> to
>>>>>>>> 
>>>>>>>> transform it to a single output file (or at least with a single
>>>>>>>> transform
>>>>>>>> template execution), then you have to be explicit about that, as
>>>>>>>> that's
>>>>>>>> 
>>>>>>>> not
>>>>>>>> 
>>>>>>>> the default behavior anymore. But it's still absolutely possible.
>>>>>>>> Imagine
>>>>>>>> it as a "list of XLSX-es" is itself like a file format. You need some
>>>>>>>> CLI
>>>>>>>> (and Maven config, etc.) syntax to express that, but that shouldn't
>>>>>>>> be a
>>>>>>>> big deal.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Sun, Feb 23, 2020 at 9:43 PM Siegfried Goeschl <
>>>>>>>> siegfried.goeschl@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> Hi Daniel,
>>>>>>>> 
>>>>>>>> Good timing - I was looking at a similar problem from different angle
>>>>>>>> yesterday (see below)
>>>>>>>> 
>>>>>>>> Don't have enough time to answer your email in detail now - will do
>>>>>>>> that
>>>>>>>> tomorrow evening
>>>>>>>> 
>>>>>>>> Thanks in advance,
>>>>>>>> 
>>>>>>>> Siegfried Goeschl
>>>>>>>> 
>>>>>>>> 
>>>>>>>> ===. START
>>>>>>>> # FreeMarker CLI Improvement
>>>>>>>> ## Support Of Multiple Template Files
>>>>>>>> Currently we support the following combinations
>>>>>>>> 
>>>>>>>> * Single template and no data files
>>>>>>>> * Single template and one or more data files
>>>>>>>> 
>>>>>>>> But we can not support the following use case which is quite typical
>>>>>>>> in
>>>>>>>> the cloud
>>>>>>>> 
>>>>>>>> __Convert multiple templates with a single data file, e.g copying a
>>>>>>>> directory of configuration files using a JSON configuration file__
>>>>>>>> 
>>>>>>>> ## Implementation notes
>>>>>>>> * When we copy a directory we can remove the `ftl`extension on the
>>>>>>>> fly
>>>>>>>> * We might need an `exclude` filter for the copy operation
>>>>>>>> * Initially resolve to a list of template files and process one after
>>>>>>>> another
>>>>>>>> * Need to calculate the output file location and extension
>>>>>>>> * We need to rename the existing command line parameters (see below)
>>>>>>>> * Do we need multiple include and exclude filter?
>>>>>>>> * Do we need file versus directory filters?
>>>>>>>> 
>>>>>>>> ### Command Line Options
>>>>>>>> ```
>>>>>>>> --input-encoding : Encoding of the documents
>>>>>>>> --output-encoding : Encoding of the rendered template
>>>>>>>> --template-encoding : Encoding of the template
>>>>>>>> --output : Output file or directory
>>>>>>>> --include-document : Include pattern for documents
>>>>>>>> --exclude-document : Exclude pattern for documents
>>>>>>>> --include-template: Include pattern for templates
>>>>>>>> --exclude-template : Exclude pattern for templates
>>>>>>>> ```
>>>>>>>> 
>>>>>>>> ### Command Line Examples
>>>>>>>> ```text
>>>>>>>> # Copy all FTL templates found in "ext/config" to the "/config"
>>>>>>>> 
>>>>>>>> directory
>>>>>>>> 
>>>>>>>> using the data from "config.json"
>>>>>>>> 
>>>>>>>> freemarker-cli -t ./ext/config --include-template *.ftl --o /config
>>>>>>>> 
>>>>>>>> config.json
>>>>>>>> 
>>>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>>>>>>>> 
>>>>>>>> --output
>>>>>>>> 
>>>>>>>> /config config.json
>>>>>>>> 
>>>>>>>> # Bascically the same using a named document "configuration"
>>>>>>>> # It might make sense to expose "conf" directly in the FreeMarker
>>>>>>>> data
>>>>>>>> model
>>>>>>>> # It might make sens to allow URIs for loading documents
>>>>>>>> 
>>>>>>>> freemarker-cli -t ./ext/config/*.ftl -o /config -d
>>>>>>>> 
>>>>>>>> configuration=config.json
>>>>>>>> 
>>>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>>>>>>>> 
>>>>>>>> --output
>>>>>>>> 
>>>>>>>> /config --document configuration=config.json
>>>>>>>> 
>>>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>>>>>>>> 
>>>>>>>> --output
>>>>>>>> 
>>>>>>>> /config --document configuration=file:///config.json
>>>>>>>> 
>>>>>>>> # Bascically the same using an environment variable as named document
>>>>>>>> 
>>>>>>>> freemarker-cli -t ./ext/config --include-template *.ftl -o /config -d
>>>>>>>> 
>>>>>>>> configuration=env:///CONFIGURATION
>>>>>>>> 
>>>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>>>>>>>> 
>>>>>>>> --output
>>>>>>>> 
>>>>>>>> /config --document configuration=env:///CONFIGURATION
>>>>>>>> ```
>>>>>>>> === END
>>>>>>>> 
>>>>>>>> On 23.02.2020, at 16:37, Daniel Dekany <dd...@apache.org> wrote:
>>>>>>>> 
>>>>>>>> Input documents is a fundamental concept in freemarker-generator, so
>>>>>>>> we
>>>>>>>> should think about that more, and probably refine/rework how it's
>>>>>>>> done.
>>>>>>>> 
>>>>>>>> Currently it works like this, with CLI at least.
>>>>>>>> 
>>>>>>>> freemarker-cli
>>>>>>>> -t access-report.ftl
>>>>>>>> somewhere/foo-access-log.csv
>>>>>>>> 
>>>>>>>> Then in access-report.ftl you have to do something like this:
>>>>>>>> 
>>>>>>>> <#assign doc = Documents.get(0)>
>>>>>>>> ... process doc here
>>>>>>>> 
>>>>>>>> (The more idiomatic Documents[0] won't work. Actually, that lead to a
>>>>>>>> 
>>>>>>>> funny
>>>>>>>> 
>>>>>>>> chain of coincidences: It returned the string "D", then
>>>>>>>> 
>>>>>>>> CSVTool.parse(...)
>>>>>>>> 
>>>>>>>> happily parsed that to a table with the single column "D", and 0
>>>>>>>> rows,
>>>>>>>> 
>>>>>>>> and
>>>>>>>> 
>>>>>>>> as there were 0 rows, the template didn't run into an error because
>>>>>>>> row.myExpectedColumn refers to a missing column either, so the
>>>>>>>> process
>>>>>>>> finished with success. (: Pretty unlucky for sure. The root was
>>>>>>>> unintentionally breaking a FreeMarker idiom though; eventually we
>>>>>>>> will
>>>>>>>> 
>>>>>>>> have
>>>>>>>> 
>>>>>>>> to work on those too, but, different topic.)
>>>>>>>> 
>>>>>>>> However, actually multiple input documents can be passed in:
>>>>>>>> 
>>>>>>>> freemarker-cli
>>>>>>>> -t access-report.ftl
>>>>>>>> somewhere/foo-access-log.csv
>>>>>>>> somewhere/bar-access-log.csv
>>>>>>>> 
>>>>>>>> Above template will still work, though then you ignored all but the
>>>>>>>> 
>>>>>>>> first
>>>>>>>> 
>>>>>>>> document. So if you expect any number of input documents, you
>>>>>>>> probably
>>>>>>>> 
>>>>>>>> will
>>>>>>>> 
>>>>>>>> have to do this:
>>>>>>>> 
>>>>>>>> <#list Documents.list as doc>
>>>>>>>> ... process doc here
>>>>>>>> </#list>
>>>>>>>> 
>>>>>>>> (The more idiomatic <#list Documents as doc> won't work; but again,
>>>>>>>> 
>>>>>>>> those
>>>>>>>> 
>>>>>>>> we will work out in a different thread.)
>>>>>>>> 
>>>>>>>> 
>>>>>>>> So, what would be better, in my opinion. I start out from what I
>>>>>>>> think
>>>>>>>> 
>>>>>>>> are
>>>>>>>> 
>>>>>>>> the common uses cases, in decreasing order of frequency. Goal is to
>>>>>>>> 
>>>>>>>> make
>>>>>>>> 
>>>>>>>> those less error prone for the users, and simpler to express.
>>>>>>>> 
>>>>>>>> USE CASE 1
>>>>>>>> 
>>>>>>>> You have exactly 1 input documents, which is therefore simply "the"
>>>>>>>> document in the mind of the user. This is probably the typical use
>>>>>>>> 
>>>>>>>> case,
>>>>>>>> 
>>>>>>>> but at least the use case users typically start out from when
>>>>>>>> starting
>>>>>>>> 
>>>>>>>> the
>>>>>>>> 
>>>>>>>> work.
>>>>>>>> 
>>>>>>>> freemarker-cli
>>>>>>>> -t access-report.ftl
>>>>>>>> somewhere/foo-access-log.csv
>>>>>>>> 
>>>>>>>> Then `Documents.get(0)` is not very fitting. Most importantly it's
>>>>>>>> 
>>>>>>>> error
>>>>>>>> 
>>>>>>>> prone, because if the user passed in more than 1 documents (can even
>>>>>>>> 
>>>>>>>> happen
>>>>>>>> 
>>>>>>>> totally accidentally, like if the user was lazy and used a wildcard
>>>>>>>> 
>>>>>>>> that
>>>>>>>> 
>>>>>>>> the shell exploded), the template will silently ignore the rest of
>>>>>>>> the
>>>>>>>> documents, and the singe document processed will be practically
>>>>>>>> picked
>>>>>>>> randomly. The user might won't notice that and submits a bad report
>>>>>>>> or
>>>>>>>> 
>>>>>>>> such.
>>>>>>>> 
>>>>>>>> I think that in this use case the document should be simply referred
>>>>>>>> as
>>>>>>>> `Document` in the template. When you have multiple documents there,
>>>>>>>> referring to `Document` should be an error, saying that the template
>>>>>>>> 
>>>>>>>> was
>>>>>>>> 
>>>>>>>> made to process a single document only.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> USE CASE 2
>>>>>>>> 
>>>>>>>> You have multiple input documents, but each has different role
>>>>>>>> 
>>>>>>>> (different
>>>>>>>> 
>>>>>>>> schema, maybe different file type). Like, you pass in users.csv and
>>>>>>>> groups.csv. Each has difference schema, and so you want to access
>>>>>>>> them
>>>>>>>> differently, but in the same template.
>>>>>>>> 
>>>>>>>> freemarker-cli
>>>>>>>> [...]
>>>>>>>> --named-document users somewhere/foo-users.csv
>>>>>>>> --named-document groups somewhere/foo-groups.csv
>>>>>>>> 
>>>>>>>> Then in the template you could refer to them as:
>>>>>>>> 
>>>>>>>> `NamedDocuments.users`,
>>>>>>>> 
>>>>>>>> and `NamedDocuments.groups`.
>>>>>>>> 
>>>>>>>> Use Case 1, and 2 can be unified into a coherent concept, where
>>>>>>>> 
>>>>>>>> `Document`
>>>>>>>> 
>>>>>>>> is just a shorthand for `NamedDocuments.main`. It's called "main"
>>>>>>>> 
>>>>>>>> because
>>>>>>>> 
>>>>>>>> that's "the" document the template is about, but then you have to
>>>>>>>> added
>>>>>>>> some helper documents, with symbolic names representing their role.
>>>>>>>> 
>>>>>>>> freemarker-cli
>>>>>>>> -t access-report.ftl
>>>>>>>> --document-name=main somewhere/foo-access-log.csv
>>>>>>>> --document-name=users somewhere/foo-users.csv
>>>>>>>> --document-name=groups somewhere/foo-groups.csv
>>>>>>>> 
>>>>>>>> Here, `Document` still works in the template, and it refers to
>>>>>>>> `somewhere/foo-access-log.csv`. (While omitting --document-name=main
>>>>>>>> 
>>>>>>>> above
>>>>>>>> 
>>>>>>>> would be cleaner, I couldn't figure out how to do that with Picocli.
>>>>>>>> Anyway, for now the point is the concept, which is not specific to
>>>>>>>> 
>>>>>>>> CLI.)
>>>>>>>> 
>>>>>>>> USE CASE 3
>>>>>>>> 
>>>>>>>> Here you have several of the same kind of documents. That has a more
>>>>>>>> generic sub-use-case, when you have explicitly named documents (like
>>>>>>>> "users" above), and for some you expect multiple input files.
>>>>>>>> 
>>>>>>>> freemarker-cli
>>>>>>>> -t access-report.ftl
>>>>>>>> --document-name=main somewhere/foo-access-log.csv
>>>>>>>> somewhere/bar-access-log.csv
>>>>>>>> --document-name=users somewhere/foo-users.csv
>>>>>>>> somewhere/bar-users.csv
>>>>>>>> --document-name=groups somewhere/global-groups.csv
>>>>>>>> 
>>>>>>>> The template must to be written with this use case in mind, as now it
>>>>>>>> 
>>>>>>>> has
>>>>>>>> 
>>>>>>>> #list some of the documents. (I think in practice you hardly ever
>>>>>>>> want
>>>>>>>> 
>>>>>>>> to
>>>>>>>> 
>>>>>>>> get a document by hard coded index. Either you don't know how many
>>>>>>>> documents you have, so you can't use hard coded indexes, or you do,
>>>>>>>> and
>>>>>>>> each index has a specific meaning, but then you should name the
>>>>>>>> 
>>>>>>>> documents
>>>>>>>> 
>>>>>>>> instead, as using indexes is error prone, and hard to read.)
>>>>>>>> Accessing that list of documents in the template, maybe could be done
>>>>>>>> 
>>>>>>>> like
>>>>>>>> 
>>>>>>>> this:
>>>>>>>> - For the "main" documents: `DocumentList`
>>>>>>>> - For explicitly named documents, like "users":
>>>>>>>> 
>>>>>>>> `NamedDocumentLists.users`
>>>>>>>> 
>>>>>>>> SUMMING UP
>>>>>>>> 
>>>>>>>> To unify all 3 use cases into a coherent concept:
>>>>>>>> - `NamedDocumentLists.<name>` is the most generic form, and while you
>>>>>>>> 
>>>>>>>> can
>>>>>>>> 
>>>>>>>> achieve everything with it, using it requires your template to handle
>>>>>>>> 
>>>>>>>> the
>>>>>>>> 
>>>>>>>> most generic case too. So, I think it would be rarely used.
>>>>>>>> - `DocumentList` is just a shorthand for `NamedDocumentLists.main`.
>>>>>>>> 
>>>>>>>> It's
>>>>>>>> 
>>>>>>>> used if you only have one kind of documents (single format and
>>>>>>>> schema),
>>>>>>>> 
>>>>>>>> but
>>>>>>>> 
>>>>>>>> potentially multiple of them.
>>>>>>>> - `NamedDocuments.<name>` expresses that you expect exactly 1
>>>>>>>> document
>>>>>>>> 
>>>>>>>> of
>>>>>>>> 
>>>>>>>> the given name.
>>>>>>>> - `Document` is just a shorthand for `NamedDocuments.main`. This is
>>>>>>>> for
>>>>>>>> 
>>>>>>>> the
>>>>>>>> 
>>>>>>>> most natural/frequent use case.
>>>>>>>> 
>>>>>>>> That's 4 possible ways of accessing your documents, which is a
>>>>>>>> 
>>>>>>>> trade-off
>>>>>>>> 
>>>>>>>> for the sake of these:
>>>>>>>> - Catching CLI (or Maven, etc.) input where the template output
>>>>>>>> likely
>>>>>>>> 
>>>>>>>> will
>>>>>>>> 
>>>>>>>> be wrong. That's only possible if the user can communicate its intent
>>>>>>>> 
>>>>>>>> in
>>>>>>>> 
>>>>>>>> the template.
>>>>>>>> - Users don't need to deal with concepts that are irrelevant in their
>>>>>>>> concrete use case. Just start with the trivial, `Document`, and later
>>>>>>>> 
>>>>>>>> if
>>>>>>>> 
>>>>>>>> the need arises, generalize to named documents, document lists, or
>>>>>>>> 
>>>>>>>> both.
>>>>>>>> 
>>>>>>>> What do guys think?
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Best regards,
>>>>> Daniel Dekany
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Best regards,
>>>> Daniel Dekany
>>> 
>>> 
>> 
>> -- 
>> Best regards,
>> Daniel Dekany
>

Re: freemarker-generator: Improving the input documents concept

Posted by Daniel Dekany <da...@gmail.com>.

But, "datasource" is just not an existing word, right? Of course if we put
spelling mistakes into class names, that will decrease the chance of name
clashes big time, but... :)

On Sat, Feb 29, 2020 at 6:06 PM Siegfried Goeschl <
siegfried.goeschl@gmail.com> wrote:

> Well, clashes with the "java.activation.DataSource" - can do & not
> definite opinion about it :)
>
> > On 29.02.2020, at 18:03, Daniel Dekany <da...@gmail.com> wrote:
> >
> > I believe that should be DataSource (with capital S), as it's two words.
> >
> > Also, it's the name of a too widely used and known JDBC interface. So if
> > anyone can tell a similarly descriptive alternative...
> >
> > On Sat, Feb 29, 2020 at 5:03 PM Siegfried Goeschl <
> > siegfried.goeschl@gmail.com> wrote:
> >
> >> Hi Daniel,
> >>
> >> I'm an enterprise developer - bad habits die hard :-)
> >>
> >> So I closed the following tickets and merged the branches
> >>
> >> 1) FREEMARKER-129 freemarker-generator: Merge "freemarker-cli" into
> >> "freemarker-generator"
> >> 2) FREEMARKER-134 freemarker-generator: Rename "Document" to
> "Datasource"
> >> 3) FREEMARKER-135 freemarker-generator-cli: Support user-supplied names
> >> for datasources
> >>
> >> Thanks in advance,
> >>
> >> Siegfried Goeschl
> >>
> >>
> >>> On 29.02.2020, at 12:19, Daniel Dekany <da...@gmail.com>
> wrote:
> >>>
> >>> Yeah, and of course, you can merge that branch. You can even work on
> the
> >>> master directly after all.
> >>>
> >>> On Sat, Feb 29, 2020 at 12:17 PM Daniel Dekany <
> daniel.dekany@gmail.com>
> >>> wrote:
> >>>
> >>>> But, I do recognize the cattle use case (several "faceless" files with
> >>>> common format/schema). Only, my idea is to push that complexity on the
> >> data
> >>>> source. The "data source" concept shields the rest of the application
> >> from
> >>>> the details of how the data is stored or retrieved. So, a data source
> >> might
> >>>> loads a bunch of log files from a directory, and present them as a
> >> single
> >>>> big table, or like a list of tables, etc. So I want to deal with the
> >> cattle
> >>>> use case, but the question is what part of the of architecture will
> deal
> >>>> with this complication, with other words, how do you box things. Why
> my
> >>>> initial bet is to stuff that complication into the "data source"
> >>>> implementation(s) is that data sources are inherently varied. Some
> >> returns
> >>>> a table-like thing, some have multiple named tables (worksheets in
> >> Excel),
> >>>> some returns tree of nodes (XML), etc. So then, some might returns a
> >>>> list-of-list-of log records, or just a single list of log-records (put
> >>>> together from daily log files). That way cattles don't add to
> conceptual
> >>>> complexity. Now, you might be aware of cases where the cattle concept
> >> must
> >>>> be more exposed than this, and the we can't box things like this. But
> >> this
> >>>> is what I tried to express.
> >>>>
> >>>> Regarding "output generators", and how that applies on the command
> >> line. I
> >>>> think it's important that the common core between Maven and
> >> command-line is
> >>>> as fat as possible. Ideally, they are just two syntax to set up the
> same
> >>>> thing. Mostly at least. So, if you specify a template file to the CLI
> >>>> application, in a way so that it causes it to process that template to
> >>>> generate a single output, then there you have just defined an "output
> >>>> generator" (even if it wasn't explicitly called like that in the
> command
> >>>> line). If you specify 3 csv files to the CLI application, in a way so
> >> that
> >>>> it causes it to generate 3 output files, then you have just defined 3
> >>>> "output generators" there (there's at least one template specified
> there
> >>>> too, but that wasn't an "output generator" itself, it was just an
> >> attribute
> >>>> of the 3 output generators). If you specify 1 template, and 3 csv
> >> files, in
> >>>> a way so that it will yield 4 output files (1 for the template, 3 for
> >> the
> >>>> csv-s), then you have defined 4 output generators there. If you have a
> >> data
> >>>> source that loads a list of 3 entities (say, 3 csv files, so it's a
> >> list of
> >>>> tables then), and you have 2 templates, and you tell the CLI to
> execute
> >>>> each template for each item in said data source, then you have just
> >> defined
> >>>> 6 "output generators".
> >>>>
> >>>> On Fri, Feb 28, 2020 at 11:08 AM Siegfried Goeschl <
> >>>> siegfried.goeschl@gmail.com> wrote:
> >>>>
> >>>>> Hi Daniel,
> >>>>>
> >>>>> That all depends on your mental model and work you do, expectations,
> >>>>> experience :-)
> >>>>>
> >>>>>
> >>>>> __Document Handling__
> >>>>>
> >>>>> *"But I think actually we have no good use case for list of documents
> >>>>> that's passed at once to a single template run, so, we can just
> ignore
> >>>>> that complication"*
> >>>>>
> >>>>> In my case that's not a complication but my daily business - I'm
> >>>>> regularly wading through access logs - yesterday probably a couple of
> >>>>> hundreds access logs across two staging sites to help tracking some
> >>>>> strange API gateway issues :-)
> >>>>>
> >>>>> My gut feeling is (borrowing from
> >>>>>
> >>>>>
> >>
> https://medium.com/@Joachim8675309/devops-concepts-pets-vs-cattle-2380b5aab313
> >>>>> )
> >>>>>
> >>>>> 1. You have a few lovely named documents / templates - `pets`
> >>>>> 2. You have tons of anonymous documents / templates to process -
> >>>>> `cattle`
> >>>>> 3. The "grey area" comes into play when mixing `pets & cattle`
> >>>>>
> >>>>> `freemarker-cli` was built with 2) in mind and I want to cover 1)
> since
> >>>>> it is equally important and common.
> >>>>>
> >>>>>
> >>>>> __Template And Document Processing Modes__
> >>>>>
> >>>>> IMHO it is important to answer the following question : "How many
> >>>>> outputs do you get when rendering 2 template and 3 datasources? Two,
> >>>>> Three or Six?"
> >>>>>
> >>>>> Your answer is influenced by your mental model / experience
> >>>>>
> >>>>> * When wading through tons of CSV files, access logs, etc. the answer
> >> is
> >>>>> "2"
> >>>>> * When doing source code generation the obvious answer is "6"
> >>>>> * Can't image a use case which results in "3" but I'm pretty sure we
> >>>>> will encounter one
> >>>>>
> >>>>> __Template and document mode probably shouldn't exist__
> >>>>>
> >>>>> That's hard for me to fully understand - I definitely lack your
> >> insights
> >>>>> & experience writing such tools :-)
> >>>>>
> >>>>> Defining the `Output Generator` is the underlying model for the Maven
> >>>>> plugin (and probably FMPP).
> >>>>>
> >>>>> I'm not sure if this applies for command lines at least not in the
> way
> >> I
> >>>>> use them (or would like to use them)
> >>>>>
> >>>>>
> >>>>> Thanks in advance,
> >>>>>
> >>>>> Siegfried Goeschl
> >>>>>
> >>>>> PS: Can/shall I merge the PR to bring in `freemarker-cli`?
> >>>>>
> >>>>>
> >>>>> On 28 Feb 2020, at 9:14, Daniel Dekany wrote:
> >>>>>
> >>>>>> Yeah, "data source" is surely a too popular name, but for reason.
> >>>>>> Anyone
> >>>>>> has other ideas?
> >>>>>>
> >>>>>> As of naming data sources and such. One thing I was wondering about
> >>>>>> back
> >>>>>> then is how to deal with list of documents given to a template,
> versus
> >>>>>> exactly 1 document given to a template. But I think actually we have
> >>>>>> no
> >>>>>> good use case for list of documents that's passed at once to a
> single
> >>>>>> template run, so, we can just ignore that complication. A document
> has
> >>>>>> a
> >>>>>> name, and that's always just a single document, not a collection, as
> >>>>>> far as
> >>>>>> the template is concerned. (We can have multiple documents per run,
> >>>>>> but
> >>>>>> those normally yield separate output generators, so it's still only
> >>>>>> one
> >>>>>> document per template.) However, we can have data source types
> >>>>>> (document
> >>>>>> types with old terminology) that collect together multiple data
> files.
> >>>>>> So
> >>>>>> then that complexity is encapsulated into the data source type, and
> >>>>>> doesn't
> >>>>>> complicate the overall architecture. That's another case when a data
> >>>>>> source
> >>>>>> is not just a file. Like maybe there's a data source type that loads
> >>>>>> all
> >>>>>> the CSV-s from a directory, into a single big table (I had such
> case),
> >>>>>> or
> >>>>>> even into a list of tables. Or, as I mentioned already, a data
> source
> >>>>>> is
> >>>>>> maybe an SQL query on a JDBC data source (and we got the first term
> >>>>>> clash... JDBC also call them data sources).
> >>>>>>
> >>>>>> Template and document mode probably shouldn't exist from user
> >>>>>> perspective
> >>>>>> either, at least not as a global option that must apply to
> everything
> >>>>>> in a
> >>>>>> run. They could just give the files that define the "output
> >>>>>> generators",
> >>>>>> and some of them will be templates, some of them are data files, in
> >>>>>> which
> >>>>>> case a template need to be associated with them (and there can be a
> >>>>>> couple
> >>>>>> of ways of doing that). And then again, there are the cases where
> you
> >>>>>> want
> >>>>>> to create one output generator per entity from some data source.
> >>>>>>
> >>>>>> On Fri, Feb 28, 2020 at 8:23 AM Siegfried Goeschl <
> >>>>>> siegfried.goeschl@gmail.com> wrote:
> >>>>>>
> >>>>>>> Hi Daniel,
> >>>>>>>
> >>>>>>> See my comments below - and thanks for your patience and input :-)
> >>>>>>>
> >>>>>>> *Renaming Document To DataSource*
> >>>>>>>
> >>>>>>> Yes, makes sense. I tried to avoid since I'm using javax.activation
> >>>>>>> and
> >>>>>>> its DataSource.
> >>>>>>>
> >>>>>>> *Template And Document Mode*
> >>>>>>>
> >>>>>>> Agreed - I think it is a valuable abstraction for the user but it
> is
> >>>>>>> not
> >>>>>>> an implementation concept :-)
> >>>>>>>
> >>>>>>> *Document Without Symbolic Names*
> >>>>>>>
> >>>>>>> Also agreed and it is going to change but I have not settled my
> mind
> >>>>>>> yet
> >>>>>>> what exactly to implement.
> >>>>>>>
> >>>>>>> Thanks in advance,
> >>>>>>>
> >>>>>>> Siegfried Goeschl
> >>>>>>>
> >>>>>>> On 28 Feb 2020, at 1:05, Daniel Dekany wrote:
> >>>>>>>
> >>>>>>> A few quick thoughts on that:
> >>>>>>>
> >>>>>>> - We should replace the "document" term with something more
> speaking.
> >>>>>>> It
> >>>>>>> doesn't tell that it's some kind of input. Also, most of these
> inputs
> >>>>>>> aren't something that people typically call documents. Like a csv
> >>>>>>> file, or
> >>>>>>> a database table, which is not even a file (OK we don't support
> such
> >>>>>>> thing
> >>>>>>> at the moment). I think, maybe "data source" is a safe enough term.
> >>>>>>> (It
> >>>>>>> also rhymes with data model.)
> >>>>>>> - You have separate "template" and "document" "mode", that applies
> to
> >>>>>>> a
> >>>>>>> whole run. I think such specialization won't be helpful. We could
> >>>>>>> just say,
> >>>>>>> on the conceptual level at lest, that we need a set of "outputs
> >>>>>>> generators". An output generator is an object (in the API) that
> >>>>>>> specifies a
> >>>>>>> template, a data-model (where the data-model is possibly populated
> >>>>>>> with
> >>>>>>> "documents"), and an output "sink" (a file path, or stdout), and
> can
> >>>>>>> generate the output itself. A practical way of defining the output
> >>>>>>> generators in a CLI application is via a bunch of files, each
> >>>>>>> defining an
> >>>>>>> output generator. Some of those files is maybe a template (that you
> >>>>>>> can
> >>>>>>> even detect from the file extension), or a data file that we
> >>>>>>> currently call
> >>>>>>> a "document". They could freely mix inside the same run. I have
> also
> >>>>>>> met
> >>>>>>> use case when you have a single table (single "document"), and each
> >>>>>>> record
> >>>>>>> in it yields an output file. That can also be described in some
> file
> >>>>>>> format, or really in any other way, like directly in command line
> >>>>>>> argument,
> >>>>>>> via API, etc.
> >>>>>>> - You have multiple documents without associated symbolical name in
> >>>>>>> some
> >>>>>>> examples. Templates can't identify those then in a well
> maintainable
> >>>>>>> way.
> >>>>>>> The actual file name is often not a good identifier, can change
> over
> >>>>>>> time,
> >>>>>>> and you might don't even have good control over it, like you
> already
> >>>>>>> receive it as a parameter from somewhere else, or someone
> >>>>>>> moves/renames
> >>>>>>> that files that you need to read. Index is also not very good, but
> I
> >>>>>>> have
> >>>>>>> written about that earlier.
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, Feb 26, 2020 at 9:33 AM Siegfried Goeschl <
> >>>>>>> siegfried.goeschl@gmail.com> wrote:
> >>>>>>>
> >>>>>>> Hi folks,
> >>>>>>>
> >>>>>>> still wrapping my side around but assembled some thoughts here -
> >>>>>>> https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449
> >>>>>>>
> >>>>>>> Thanks in advance,
> >>>>>>>
> >>>>>>> Siegfried Goeschl
> >>>>>>>
> >>>>>>>
> >>>>>>> On 23 Feb 2020, at 23:14, Daniel Dekany <dd...@apache.org>
> wrote:
> >>>>>>>
> >>>>>>> What you are describing is more like the angle that FMPP took
> >>>>>>> initially,
> >>>>>>> where templates drive things, they generate the output for
> themselves
> >>>>>>>
> >>>>>>> (even
> >>>>>>>
> >>>>>>> multiple output files if they wish). By default output files name
> >>>>>>> (and
> >>>>>>> relative path) is deduced from template name. There was also a
> global
> >>>>>>> data-model, built in a configuration file (or equally, built via
> >>>>>>> command
> >>>>>>> line arguments, or both mixed), from which templates get whatever
> >>>>>>> data
> >>>>>>>
> >>>>>>> they
> >>>>>>>
> >>>>>>> are interested in. Take a look at the figures here:
> >>>>>>> http://fmpp.sourceforge.net/qtour.html. Later, this concept was
> >>>>>>>
> >>>>>>> generalized
> >>>>>>>
> >>>>>>> a bit more, because you could add XML files at the same place where
> >>>>>>> you
> >>>>>>> have the templates, and then you could associate transform
> templates
> >>>>>>> to
> >>>>>>>
> >>>>>>> the
> >>>>>>>
> >>>>>>> XML files (based on path pattern and/or the XML document element).
> >>>>>>> Now
> >>>>>>> that's like what freemarker-generator had initially (data files
> drive
> >>>>>>> output, and the template is there to transform it).
> >>>>>>>
> >>>>>>> So I think the generic mental model would like this:
> >>>>>>>
> >>>>>>> 1. You got files that drive the process, let's call them *generator
> >>>>>>> files* for now. Usually, each generator file yields an output file
> >>>>>>> (but
> >>>>>>> maybe even multiple output files, as you might saw in the last
> >>>>>>> figure).
> >>>>>>> These generator files can be of many types, like XML, JSON, XLSX
> (as
> >>>>>>>
> >>>>>>> in the
> >>>>>>>
> >>>>>>> original freemarker-generator), and even templates (as is the norm
> in
> >>>>>>> FMPP). If the file is not a template, then you got a set of
> >>>>>>> transformer
> >>>>>>> templates (-t CLI option) in a separate directory, which can be
> >>>>>>>
> >>>>>>> associated
> >>>>>>>
> >>>>>>> with the generator files base on name patterns, and even based on
> >>>>>>>
> >>>>>>> content
> >>>>>>>
> >>>>>>> (schema usually). If the generator file is a template (so that's a
> >>>>>>> positional @Parameter CLI argument that happens to be an *.ftl, and
> >>>>>>> is
> >>>>>>>
> >>>>>>> not
> >>>>>>>
> >>>>>>> a template file specified after the "-t" option), then you just
> >>>>>>> Template.process(...) it, and it prints what the output will be.
> >>>>>>> 2. You also have a set of variables, the global data-model, that
> >>>>>>> contains commonly useful stuff, like what you now call parameters
> >>>>>>> (CLI
> >>>>>>> -Pname=value), but also maybe data loaded from JSON, XML, etc..
> Those
> >>>>>>>
> >>>>>>> data
> >>>>>>>
> >>>>>>> files aren't "generator files". Templates just use them if they
> need
> >>>>>>>
> >>>>>>> them.
> >>>>>>>
> >>>>>>> An important thing here is to reuse the same mechanism to read and
> >>>>>>>
> >>>>>>> parse
> >>>>>>>
> >>>>>>> those data files, which was used in templates when transforming
> >>>>>>>
> >>>>>>> generator
> >>>>>>>
> >>>>>>> files. So we need a common format for specifying how to load data
> >>>>>>>
> >>>>>>> files.
> >>>>>>>
> >>>>>>> That's maybe just FTL that #assigns to the variables, or maybe more
> >>>>>>> declarative format.
> >>>>>>>
> >>>>>>> What I have described in the original post here was a less generic
> >>>>>>> form
> >>>>>>>
> >>>>>>> of
> >>>>>>>
> >>>>>>> this, as I tried to be true with the original approach. I though
> the
> >>>>>>> proposal will be drastic enough as it is... :) There, the "main"
> >>>>>>> document
> >>>>>>> is the "generator file" from point 1, the "-t" template is the
> >>>>>>> transform
> >>>>>>> template for the "main" document, and the other named documents
> >>>>>>> ("users",
> >>>>>>> "groups") is a poor man's shared data-model from point 2 (together
> >>>>>>> with
> >>>>>>> with -PName=value).
> >>>>>>>
> >>>>>>> There's further somewhat confusing thing to get right with the
> >>>>>>> list-of-documents (`DocuentList`, `NamedDocumentLists`) thing
> though.
> >>>>>>> In
> >>>>>>> the model above, as per point 1, if you list multiple data files,
> >>>>>>> each
> >>>>>>>
> >>>>>>> will
> >>>>>>>
> >>>>>>> generate a separate output file. So, if you need take in a list of
> >>>>>>> files
> >>>>>>>
> >>>>>>> to
> >>>>>>>
> >>>>>>> transform it to a single output file (or at least with a single
> >>>>>>> transform
> >>>>>>> template execution), then you have to be explicit about that, as
> >>>>>>> that's
> >>>>>>>
> >>>>>>> not
> >>>>>>>
> >>>>>>> the default behavior anymore. But it's still absolutely possible.
> >>>>>>> Imagine
> >>>>>>> it as a "list of XLSX-es" is itself like a file format. You need
> some
> >>>>>>> CLI
> >>>>>>> (and Maven config, etc.) syntax to express that, but that shouldn't
> >>>>>>> be a
> >>>>>>> big deal.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Sun, Feb 23, 2020 at 9:43 PM Siegfried Goeschl <
> >>>>>>> siegfried.goeschl@gmail.com> wrote:
> >>>>>>>
> >>>>>>> Hi Daniel,
> >>>>>>>
> >>>>>>> Good timing - I was looking at a similar problem from different
> angle
> >>>>>>> yesterday (see below)
> >>>>>>>
> >>>>>>> Don't have enough time to answer your email in detail now - will do
> >>>>>>> that
> >>>>>>> tomorrow evening
> >>>>>>>
> >>>>>>> Thanks in advance,
> >>>>>>>
> >>>>>>> Siegfried Goeschl
> >>>>>>>
> >>>>>>>
> >>>>>>> ===. START
> >>>>>>> # FreeMarker CLI Improvement
> >>>>>>> ## Support Of Multiple Template Files
> >>>>>>> Currently we support the following combinations
> >>>>>>>
> >>>>>>> * Single template and no data files
> >>>>>>> * Single template and one or more data files
> >>>>>>>
> >>>>>>> But we can not support the following use case which is quite
> typical
> >>>>>>> in
> >>>>>>> the cloud
> >>>>>>>
> >>>>>>> __Convert multiple templates with a single data file, e.g copying a
> >>>>>>> directory of configuration files using a JSON configuration file__
> >>>>>>>
> >>>>>>> ## Implementation notes
> >>>>>>> * When we copy a directory we can remove the `ftl`extension on the
> >>>>>>> fly
> >>>>>>> * We might need an `exclude` filter for the copy operation
> >>>>>>> * Initially resolve to a list of template files and process one
> after
> >>>>>>> another
> >>>>>>> * Need to calculate the output file location and extension
> >>>>>>> * We need to rename the existing command line parameters (see
> below)
> >>>>>>> * Do we need multiple include and exclude filter?
> >>>>>>> * Do we need file versus directory filters?
> >>>>>>>
> >>>>>>> ### Command Line Options
> >>>>>>> ```
> >>>>>>> --input-encoding : Encoding of the documents
> >>>>>>> --output-encoding : Encoding of the rendered template
> >>>>>>> --template-encoding : Encoding of the template
> >>>>>>> --output : Output file or directory
> >>>>>>> --include-document : Include pattern for documents
> >>>>>>> --exclude-document : Exclude pattern for documents
> >>>>>>> --include-template: Include pattern for templates
> >>>>>>> --exclude-template : Exclude pattern for templates
> >>>>>>> ```
> >>>>>>>
> >>>>>>> ### Command Line Examples
> >>>>>>> ```text
> >>>>>>> # Copy all FTL templates found in "ext/config" to the "/config"
> >>>>>>>
> >>>>>>> directory
> >>>>>>>
> >>>>>>> using the data from "config.json"
> >>>>>>>
> >>>>>>> freemarker-cli -t ./ext/config --include-template *.ftl --o /config
> >>>>>>>
> >>>>>>> config.json
> >>>>>>>
> >>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
> >>>>>>>
> >>>>>>> --output
> >>>>>>>
> >>>>>>> /config config.json
> >>>>>>>
> >>>>>>> # Bascically the same using a named document "configuration"
> >>>>>>> # It might make sense to expose "conf" directly in the FreeMarker
> >>>>>>> data
> >>>>>>> model
> >>>>>>> # It might make sens to allow URIs for loading documents
> >>>>>>>
> >>>>>>> freemarker-cli -t ./ext/config/*.ftl -o /config -d
> >>>>>>>
> >>>>>>> configuration=config.json
> >>>>>>>
> >>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
> >>>>>>>
> >>>>>>> --output
> >>>>>>>
> >>>>>>> /config --document configuration=config.json
> >>>>>>>
> >>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
> >>>>>>>
> >>>>>>> --output
> >>>>>>>
> >>>>>>> /config --document configuration=file:///config.json
> >>>>>>>
> >>>>>>> # Bascically the same using an environment variable as named
> document
> >>>>>>>
> >>>>>>> freemarker-cli -t ./ext/config --include-template *.ftl -o /config
> -d
> >>>>>>>
> >>>>>>> configuration=env:///CONFIGURATION
> >>>>>>>
> >>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
> >>>>>>>
> >>>>>>> --output
> >>>>>>>
> >>>>>>> /config --document configuration=env:///CONFIGURATION
> >>>>>>> ```
> >>>>>>> === END
> >>>>>>>
> >>>>>>> On 23.02.2020, at 16:37, Daniel Dekany <dd...@apache.org> wrote:
> >>>>>>>
> >>>>>>> Input documents is a fundamental concept in freemarker-generator,
> so
> >>>>>>> we
> >>>>>>> should think about that more, and probably refine/rework how it's
> >>>>>>> done.
> >>>>>>>
> >>>>>>> Currently it works like this, with CLI at least.
> >>>>>>>
> >>>>>>> freemarker-cli
> >>>>>>> -t access-report.ftl
> >>>>>>> somewhere/foo-access-log.csv
> >>>>>>>
> >>>>>>> Then in access-report.ftl you have to do something like this:
> >>>>>>>
> >>>>>>> <#assign doc = Documents.get(0)>
> >>>>>>> ... process doc here
> >>>>>>>
> >>>>>>> (The more idiomatic Documents[0] won't work. Actually, that lead
> to a
> >>>>>>>
> >>>>>>> funny
> >>>>>>>
> >>>>>>> chain of coincidences: It returned the string "D", then
> >>>>>>>
> >>>>>>> CSVTool.parse(...)
> >>>>>>>
> >>>>>>> happily parsed that to a table with the single column "D", and 0
> >>>>>>> rows,
> >>>>>>>
> >>>>>>> and
> >>>>>>>
> >>>>>>> as there were 0 rows, the template didn't run into an error because
> >>>>>>> row.myExpectedColumn refers to a missing column either, so the
> >>>>>>> process
> >>>>>>> finished with success. (: Pretty unlucky for sure. The root was
> >>>>>>> unintentionally breaking a FreeMarker idiom though; eventually we
> >>>>>>> will
> >>>>>>>
> >>>>>>> have
> >>>>>>>
> >>>>>>> to work on those too, but, different topic.)
> >>>>>>>
> >>>>>>> However, actually multiple input documents can be passed in:
> >>>>>>>
> >>>>>>> freemarker-cli
> >>>>>>> -t access-report.ftl
> >>>>>>> somewhere/foo-access-log.csv
> >>>>>>> somewhere/bar-access-log.csv
> >>>>>>>
> >>>>>>> Above template will still work, though then you ignored all but the
> >>>>>>>
> >>>>>>> first
> >>>>>>>
> >>>>>>> document. So if you expect any number of input documents, you
> >>>>>>> probably
> >>>>>>>
> >>>>>>> will
> >>>>>>>
> >>>>>>> have to do this:
> >>>>>>>
> >>>>>>> <#list Documents.list as doc>
> >>>>>>> ... process doc here
> >>>>>>> </#list>
> >>>>>>>
> >>>>>>> (The more idiomatic <#list Documents as doc> won't work; but again,
> >>>>>>>
> >>>>>>> those
> >>>>>>>
> >>>>>>> we will work out in a different thread.)
> >>>>>>>
> >>>>>>>
> >>>>>>> So, what would be better, in my opinion. I start out from what I
> >>>>>>> think
> >>>>>>>
> >>>>>>> are
> >>>>>>>
> >>>>>>> the common uses cases, in decreasing order of frequency. Goal is to
> >>>>>>>
> >>>>>>> make
> >>>>>>>
> >>>>>>> those less error prone for the users, and simpler to express.
> >>>>>>>
> >>>>>>> USE CASE 1
> >>>>>>>
> >>>>>>> You have exactly 1 input documents, which is therefore simply "the"
> >>>>>>> document in the mind of the user. This is probably the typical use
> >>>>>>>
> >>>>>>> case,
> >>>>>>>
> >>>>>>> but at least the use case users typically start out from when
> >>>>>>> starting
> >>>>>>>
> >>>>>>> the
> >>>>>>>
> >>>>>>> work.
> >>>>>>>
> >>>>>>> freemarker-cli
> >>>>>>> -t access-report.ftl
> >>>>>>> somewhere/foo-access-log.csv
> >>>>>>>
> >>>>>>> Then `Documents.get(0)` is not very fitting. Most importantly it's
> >>>>>>>
> >>>>>>> error
> >>>>>>>
> >>>>>>> prone, because if the user passed in more than 1 documents (can
> even
> >>>>>>>
> >>>>>>> happen
> >>>>>>>
> >>>>>>> totally accidentally, like if the user was lazy and used a wildcard
> >>>>>>>
> >>>>>>> that
> >>>>>>>
> >>>>>>> the shell exploded), the template will silently ignore the rest of
> >>>>>>> the
> >>>>>>> documents, and the singe document processed will be practically
> >>>>>>> picked
> >>>>>>> randomly. The user might won't notice that and submits a bad report
> >>>>>>> or
> >>>>>>>
> >>>>>>> such.
> >>>>>>>
> >>>>>>> I think that in this use case the document should be simply
> referred
> >>>>>>> as
> >>>>>>> `Document` in the template. When you have multiple documents there,
> >>>>>>> referring to `Document` should be an error, saying that the
> template
> >>>>>>>
> >>>>>>> was
> >>>>>>>
> >>>>>>> made to process a single document only.
> >>>>>>>
> >>>>>>>
> >>>>>>> USE CASE 2
> >>>>>>>
> >>>>>>> You have multiple input documents, but each has different role
> >>>>>>>
> >>>>>>> (different
> >>>>>>>
> >>>>>>> schema, maybe different file type). Like, you pass in users.csv and
> >>>>>>> groups.csv. Each has difference schema, and so you want to access
> >>>>>>> them
> >>>>>>> differently, but in the same template.
> >>>>>>>
> >>>>>>> freemarker-cli
> >>>>>>> [...]
> >>>>>>> --named-document users somewhere/foo-users.csv
> >>>>>>> --named-document groups somewhere/foo-groups.csv
> >>>>>>>
> >>>>>>> Then in the template you could refer to them as:
> >>>>>>>
> >>>>>>> `NamedDocuments.users`,
> >>>>>>>
> >>>>>>> and `NamedDocuments.groups`.
> >>>>>>>
> >>>>>>> Use Case 1, and 2 can be unified into a coherent concept, where
> >>>>>>>
> >>>>>>> `Document`
> >>>>>>>
> >>>>>>> is just a shorthand for `NamedDocuments.main`. It's called "main"
> >>>>>>>
> >>>>>>> because
> >>>>>>>
> >>>>>>> that's "the" document the template is about, but then you have to
> >>>>>>> added
> >>>>>>> some helper documents, with symbolic names representing their role.
> >>>>>>>
> >>>>>>> freemarker-cli
> >>>>>>> -t access-report.ftl
> >>>>>>> --document-name=main somewhere/foo-access-log.csv
> >>>>>>> --document-name=users somewhere/foo-users.csv
> >>>>>>> --document-name=groups somewhere/foo-groups.csv
> >>>>>>>
> >>>>>>> Here, `Document` still works in the template, and it refers to
> >>>>>>> `somewhere/foo-access-log.csv`. (While omitting
> --document-name=main
> >>>>>>>
> >>>>>>> above
> >>>>>>>
> >>>>>>> would be cleaner, I couldn't figure out how to do that with
> Picocli.
> >>>>>>> Anyway, for now the point is the concept, which is not specific to
> >>>>>>>
> >>>>>>> CLI.)
> >>>>>>>
> >>>>>>> USE CASE 3
> >>>>>>>
> >>>>>>> Here you have several of the same kind of documents. That has a
> more
> >>>>>>> generic sub-use-case, when you have explicitly named documents
> (like
> >>>>>>> "users" above), and for some you expect multiple input files.
> >>>>>>>
> >>>>>>> freemarker-cli
> >>>>>>> -t access-report.ftl
> >>>>>>> --document-name=main somewhere/foo-access-log.csv
> >>>>>>> somewhere/bar-access-log.csv
> >>>>>>> --document-name=users somewhere/foo-users.csv
> >>>>>>> somewhere/bar-users.csv
> >>>>>>> --document-name=groups somewhere/global-groups.csv
> >>>>>>>
> >>>>>>> The template must to be written with this use case in mind, as now
> it
> >>>>>>>
> >>>>>>> has
> >>>>>>>
> >>>>>>> #list some of the documents. (I think in practice you hardly ever
> >>>>>>> want
> >>>>>>>
> >>>>>>> to
> >>>>>>>
> >>>>>>> get a document by hard coded index. Either you don't know how many
> >>>>>>> documents you have, so you can't use hard coded indexes, or you do,
> >>>>>>> and
> >>>>>>> each index has a specific meaning, but then you should name the
> >>>>>>>
> >>>>>>> documents
> >>>>>>>
> >>>>>>> instead, as using indexes is error prone, and hard to read.)
> >>>>>>> Accessing that list of documents in the template, maybe could be
> done
> >>>>>>>
> >>>>>>> like
> >>>>>>>
> >>>>>>> this:
> >>>>>>> - For the "main" documents: `DocumentList`
> >>>>>>> - For explicitly named documents, like "users":
> >>>>>>>
> >>>>>>> `NamedDocumentLists.users`
> >>>>>>>
> >>>>>>> SUMMING UP
> >>>>>>>
> >>>>>>> To unify all 3 use cases into a coherent concept:
> >>>>>>> - `NamedDocumentLists.<name>` is the most generic form, and while
> you
> >>>>>>>
> >>>>>>> can
> >>>>>>>
> >>>>>>> achieve everything with it, using it requires your template to
> handle
> >>>>>>>
> >>>>>>> the
> >>>>>>>
> >>>>>>> most generic case too. So, I think it would be rarely used.
> >>>>>>> - `DocumentList` is just a shorthand for `NamedDocumentLists.main`.
> >>>>>>>
> >>>>>>> It's
> >>>>>>>
> >>>>>>> used if you only have one kind of documents (single format and
> >>>>>>> schema),
> >>>>>>>
> >>>>>>> but
> >>>>>>>
> >>>>>>> potentially multiple of them.
> >>>>>>> - `NamedDocuments.<name>` expresses that you expect exactly 1
> >>>>>>> document
> >>>>>>>
> >>>>>>> of
> >>>>>>>
> >>>>>>> the given name.
> >>>>>>> - `Document` is just a shorthand for `NamedDocuments.main`. This is
> >>>>>>> for
> >>>>>>>
> >>>>>>> the
> >>>>>>>
> >>>>>>> most natural/frequent use case.
> >>>>>>>
> >>>>>>> That's 4 possible ways of accessing your documents, which is a
> >>>>>>>
> >>>>>>> trade-off
> >>>>>>>
> >>>>>>> for the sake of these:
> >>>>>>> - Catching CLI (or Maven, etc.) input where the template output
> >>>>>>> likely
> >>>>>>>
> >>>>>>> will
> >>>>>>>
> >>>>>>> be wrong. That's only possible if the user can communicate its
> intent
> >>>>>>>
> >>>>>>> in
> >>>>>>>
> >>>>>>> the template.
> >>>>>>> - Users don't need to deal with concepts that are irrelevant in
> their
> >>>>>>> concrete use case. Just start with the trivial, `Document`, and
> later
> >>>>>>>
> >>>>>>> if
> >>>>>>>
> >>>>>>> the need arises, generalize to named documents, document lists, or
> >>>>>>>
> >>>>>>> both.
> >>>>>>>
> >>>>>>> What do guys think?
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Best regards,
> >>>> Daniel Dekany
> >>>>
> >>>
> >>>
> >>> --
> >>> Best regards,
> >>> Daniel Dekany
> >>
> >>
> >
> > --
> > Best regards,
> > Daniel Dekany
>
>

-- 
Best regards,
Daniel Dekany

Re: freemarker-generator: Improving the input documents concept

Posted by Siegfried Goeschl <si...@gmail.com>.

Well, clashes with the "java.activation.DataSource" - can do & not definite opinion about it :)

> On 29.02.2020, at 18:03, Daniel Dekany <da...@gmail.com> wrote:
> 
> I believe that should be DataSource (with capital S), as it's two words.
> 
> Also, it's the name of a too widely used and known JDBC interface. So if
> anyone can tell a similarly descriptive alternative...
> 
> On Sat, Feb 29, 2020 at 5:03 PM Siegfried Goeschl <
> siegfried.goeschl@gmail.com> wrote:
> 
>> Hi Daniel,
>> 
>> I'm an enterprise developer - bad habits die hard :-)
>> 
>> So I closed the following tickets and merged the branches
>> 
>> 1) FREEMARKER-129 freemarker-generator: Merge "freemarker-cli" into
>> "freemarker-generator"
>> 2) FREEMARKER-134 freemarker-generator: Rename "Document" to "Datasource"
>> 3) FREEMARKER-135 freemarker-generator-cli: Support user-supplied names
>> for datasources
>> 
>> Thanks in advance,
>> 
>> Siegfried Goeschl
>> 
>> 
>>> On 29.02.2020, at 12:19, Daniel Dekany <da...@gmail.com> wrote:
>>> 
>>> Yeah, and of course, you can merge that branch. You can even work on the
>>> master directly after all.
>>> 
>>> On Sat, Feb 29, 2020 at 12:17 PM Daniel Dekany <da...@gmail.com>
>>> wrote:
>>> 
>>>> But, I do recognize the cattle use case (several "faceless" files with
>>>> common format/schema). Only, my idea is to push that complexity on the
>> data
>>>> source. The "data source" concept shields the rest of the application
>> from
>>>> the details of how the data is stored or retrieved. So, a data source
>> might
>>>> loads a bunch of log files from a directory, and present them as a
>> single
>>>> big table, or like a list of tables, etc. So I want to deal with the
>> cattle
>>>> use case, but the question is what part of the of architecture will deal
>>>> with this complication, with other words, how do you box things. Why my
>>>> initial bet is to stuff that complication into the "data source"
>>>> implementation(s) is that data sources are inherently varied. Some
>> returns
>>>> a table-like thing, some have multiple named tables (worksheets in
>> Excel),
>>>> some returns tree of nodes (XML), etc. So then, some might returns a
>>>> list-of-list-of log records, or just a single list of log-records (put
>>>> together from daily log files). That way cattles don't add to conceptual
>>>> complexity. Now, you might be aware of cases where the cattle concept
>> must
>>>> be more exposed than this, and the we can't box things like this. But
>> this
>>>> is what I tried to express.
>>>> 
>>>> Regarding "output generators", and how that applies on the command
>> line. I
>>>> think it's important that the common core between Maven and
>> command-line is
>>>> as fat as possible. Ideally, they are just two syntax to set up the same
>>>> thing. Mostly at least. So, if you specify a template file to the CLI
>>>> application, in a way so that it causes it to process that template to
>>>> generate a single output, then there you have just defined an "output
>>>> generator" (even if it wasn't explicitly called like that in the command
>>>> line). If you specify 3 csv files to the CLI application, in a way so
>> that
>>>> it causes it to generate 3 output files, then you have just defined 3
>>>> "output generators" there (there's at least one template specified there
>>>> too, but that wasn't an "output generator" itself, it was just an
>> attribute
>>>> of the 3 output generators). If you specify 1 template, and 3 csv
>> files, in
>>>> a way so that it will yield 4 output files (1 for the template, 3 for
>> the
>>>> csv-s), then you have defined 4 output generators there. If you have a
>> data
>>>> source that loads a list of 3 entities (say, 3 csv files, so it's a
>> list of
>>>> tables then), and you have 2 templates, and you tell the CLI to execute
>>>> each template for each item in said data source, then you have just
>> defined
>>>> 6 "output generators".
>>>> 
>>>> On Fri, Feb 28, 2020 at 11:08 AM Siegfried Goeschl <
>>>> siegfried.goeschl@gmail.com> wrote:
>>>> 
>>>>> Hi Daniel,
>>>>> 
>>>>> That all depends on your mental model and work you do, expectations,
>>>>> experience :-)
>>>>> 
>>>>> 
>>>>> __Document Handling__
>>>>> 
>>>>> *"But I think actually we have no good use case for list of documents
>>>>> that's passed at once to a single template run, so, we can just ignore
>>>>> that complication"*
>>>>> 
>>>>> In my case that's not a complication but my daily business - I'm
>>>>> regularly wading through access logs - yesterday probably a couple of
>>>>> hundreds access logs across two staging sites to help tracking some
>>>>> strange API gateway issues :-)
>>>>> 
>>>>> My gut feeling is (borrowing from
>>>>> 
>>>>> 
>> https://medium.com/@Joachim8675309/devops-concepts-pets-vs-cattle-2380b5aab313
>>>>> )
>>>>> 
>>>>> 1. You have a few lovely named documents / templates - `pets`
>>>>> 2. You have tons of anonymous documents / templates to process -
>>>>> `cattle`
>>>>> 3. The "grey area" comes into play when mixing `pets & cattle`
>>>>> 
>>>>> `freemarker-cli` was built with 2) in mind and I want to cover 1) since
>>>>> it is equally important and common.
>>>>> 
>>>>> 
>>>>> __Template And Document Processing Modes__
>>>>> 
>>>>> IMHO it is important to answer the following question : "How many
>>>>> outputs do you get when rendering 2 template and 3 datasources? Two,
>>>>> Three or Six?"
>>>>> 
>>>>> Your answer is influenced by your mental model / experience
>>>>> 
>>>>> * When wading through tons of CSV files, access logs, etc. the answer
>> is
>>>>> "2"
>>>>> * When doing source code generation the obvious answer is "6"
>>>>> * Can't image a use case which results in "3" but I'm pretty sure we
>>>>> will encounter one
>>>>> 
>>>>> __Template and document mode probably shouldn't exist__
>>>>> 
>>>>> That's hard for me to fully understand - I definitely lack your
>> insights
>>>>> & experience writing such tools :-)
>>>>> 
>>>>> Defining the `Output Generator` is the underlying model for the Maven
>>>>> plugin (and probably FMPP).
>>>>> 
>>>>> I'm not sure if this applies for command lines at least not in the way
>> I
>>>>> use them (or would like to use them)
>>>>> 
>>>>> 
>>>>> Thanks in advance,
>>>>> 
>>>>> Siegfried Goeschl
>>>>> 
>>>>> PS: Can/shall I merge the PR to bring in `freemarker-cli`?
>>>>> 
>>>>> 
>>>>> On 28 Feb 2020, at 9:14, Daniel Dekany wrote:
>>>>> 
>>>>>> Yeah, "data source" is surely a too popular name, but for reason.
>>>>>> Anyone
>>>>>> has other ideas?
>>>>>> 
>>>>>> As of naming data sources and such. One thing I was wondering about
>>>>>> back
>>>>>> then is how to deal with list of documents given to a template, versus
>>>>>> exactly 1 document given to a template. But I think actually we have
>>>>>> no
>>>>>> good use case for list of documents that's passed at once to a single
>>>>>> template run, so, we can just ignore that complication. A document has
>>>>>> a
>>>>>> name, and that's always just a single document, not a collection, as
>>>>>> far as
>>>>>> the template is concerned. (We can have multiple documents per run,
>>>>>> but
>>>>>> those normally yield separate output generators, so it's still only
>>>>>> one
>>>>>> document per template.) However, we can have data source types
>>>>>> (document
>>>>>> types with old terminology) that collect together multiple data files.
>>>>>> So
>>>>>> then that complexity is encapsulated into the data source type, and
>>>>>> doesn't
>>>>>> complicate the overall architecture. That's another case when a data
>>>>>> source
>>>>>> is not just a file. Like maybe there's a data source type that loads
>>>>>> all
>>>>>> the CSV-s from a directory, into a single big table (I had such case),
>>>>>> or
>>>>>> even into a list of tables. Or, as I mentioned already, a data source
>>>>>> is
>>>>>> maybe an SQL query on a JDBC data source (and we got the first term
>>>>>> clash... JDBC also call them data sources).
>>>>>> 
>>>>>> Template and document mode probably shouldn't exist from user
>>>>>> perspective
>>>>>> either, at least not as a global option that must apply to everything
>>>>>> in a
>>>>>> run. They could just give the files that define the "output
>>>>>> generators",
>>>>>> and some of them will be templates, some of them are data files, in
>>>>>> which
>>>>>> case a template need to be associated with them (and there can be a
>>>>>> couple
>>>>>> of ways of doing that). And then again, there are the cases where you
>>>>>> want
>>>>>> to create one output generator per entity from some data source.
>>>>>> 
>>>>>> On Fri, Feb 28, 2020 at 8:23 AM Siegfried Goeschl <
>>>>>> siegfried.goeschl@gmail.com> wrote:
>>>>>> 
>>>>>>> Hi Daniel,
>>>>>>> 
>>>>>>> See my comments below - and thanks for your patience and input :-)
>>>>>>> 
>>>>>>> *Renaming Document To DataSource*
>>>>>>> 
>>>>>>> Yes, makes sense. I tried to avoid since I'm using javax.activation
>>>>>>> and
>>>>>>> its DataSource.
>>>>>>> 
>>>>>>> *Template And Document Mode*
>>>>>>> 
>>>>>>> Agreed - I think it is a valuable abstraction for the user but it is
>>>>>>> not
>>>>>>> an implementation concept :-)
>>>>>>> 
>>>>>>> *Document Without Symbolic Names*
>>>>>>> 
>>>>>>> Also agreed and it is going to change but I have not settled my mind
>>>>>>> yet
>>>>>>> what exactly to implement.
>>>>>>> 
>>>>>>> Thanks in advance,
>>>>>>> 
>>>>>>> Siegfried Goeschl
>>>>>>> 
>>>>>>> On 28 Feb 2020, at 1:05, Daniel Dekany wrote:
>>>>>>> 
>>>>>>> A few quick thoughts on that:
>>>>>>> 
>>>>>>> - We should replace the "document" term with something more speaking.
>>>>>>> It
>>>>>>> doesn't tell that it's some kind of input. Also, most of these inputs
>>>>>>> aren't something that people typically call documents. Like a csv
>>>>>>> file, or
>>>>>>> a database table, which is not even a file (OK we don't support such
>>>>>>> thing
>>>>>>> at the moment). I think, maybe "data source" is a safe enough term.
>>>>>>> (It
>>>>>>> also rhymes with data model.)
>>>>>>> - You have separate "template" and "document" "mode", that applies to
>>>>>>> a
>>>>>>> whole run. I think such specialization won't be helpful. We could
>>>>>>> just say,
>>>>>>> on the conceptual level at lest, that we need a set of "outputs
>>>>>>> generators". An output generator is an object (in the API) that
>>>>>>> specifies a
>>>>>>> template, a data-model (where the data-model is possibly populated
>>>>>>> with
>>>>>>> "documents"), and an output "sink" (a file path, or stdout), and can
>>>>>>> generate the output itself. A practical way of defining the output
>>>>>>> generators in a CLI application is via a bunch of files, each
>>>>>>> defining an
>>>>>>> output generator. Some of those files is maybe a template (that you
>>>>>>> can
>>>>>>> even detect from the file extension), or a data file that we
>>>>>>> currently call
>>>>>>> a "document". They could freely mix inside the same run. I have also
>>>>>>> met
>>>>>>> use case when you have a single table (single "document"), and each
>>>>>>> record
>>>>>>> in it yields an output file. That can also be described in some file
>>>>>>> format, or really in any other way, like directly in command line
>>>>>>> argument,
>>>>>>> via API, etc.
>>>>>>> - You have multiple documents without associated symbolical name in
>>>>>>> some
>>>>>>> examples. Templates can't identify those then in a well maintainable
>>>>>>> way.
>>>>>>> The actual file name is often not a good identifier, can change over
>>>>>>> time,
>>>>>>> and you might don't even have good control over it, like you already
>>>>>>> receive it as a parameter from somewhere else, or someone
>>>>>>> moves/renames
>>>>>>> that files that you need to read. Index is also not very good, but I
>>>>>>> have
>>>>>>> written about that earlier.
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Feb 26, 2020 at 9:33 AM Siegfried Goeschl <
>>>>>>> siegfried.goeschl@gmail.com> wrote:
>>>>>>> 
>>>>>>> Hi folks,
>>>>>>> 
>>>>>>> still wrapping my side around but assembled some thoughts here -
>>>>>>> https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449
>>>>>>> 
>>>>>>> Thanks in advance,
>>>>>>> 
>>>>>>> Siegfried Goeschl
>>>>>>> 
>>>>>>> 
>>>>>>> On 23 Feb 2020, at 23:14, Daniel Dekany <dd...@apache.org> wrote:
>>>>>>> 
>>>>>>> What you are describing is more like the angle that FMPP took
>>>>>>> initially,
>>>>>>> where templates drive things, they generate the output for themselves
>>>>>>> 
>>>>>>> (even
>>>>>>> 
>>>>>>> multiple output files if they wish). By default output files name
>>>>>>> (and
>>>>>>> relative path) is deduced from template name. There was also a global
>>>>>>> data-model, built in a configuration file (or equally, built via
>>>>>>> command
>>>>>>> line arguments, or both mixed), from which templates get whatever
>>>>>>> data
>>>>>>> 
>>>>>>> they
>>>>>>> 
>>>>>>> are interested in. Take a look at the figures here:
>>>>>>> http://fmpp.sourceforge.net/qtour.html. Later, this concept was
>>>>>>> 
>>>>>>> generalized
>>>>>>> 
>>>>>>> a bit more, because you could add XML files at the same place where
>>>>>>> you
>>>>>>> have the templates, and then you could associate transform templates
>>>>>>> to
>>>>>>> 
>>>>>>> the
>>>>>>> 
>>>>>>> XML files (based on path pattern and/or the XML document element).
>>>>>>> Now
>>>>>>> that's like what freemarker-generator had initially (data files drive
>>>>>>> output, and the template is there to transform it).
>>>>>>> 
>>>>>>> So I think the generic mental model would like this:
>>>>>>> 
>>>>>>> 1. You got files that drive the process, let's call them *generator
>>>>>>> files* for now. Usually, each generator file yields an output file
>>>>>>> (but
>>>>>>> maybe even multiple output files, as you might saw in the last
>>>>>>> figure).
>>>>>>> These generator files can be of many types, like XML, JSON, XLSX (as
>>>>>>> 
>>>>>>> in the
>>>>>>> 
>>>>>>> original freemarker-generator), and even templates (as is the norm in
>>>>>>> FMPP). If the file is not a template, then you got a set of
>>>>>>> transformer
>>>>>>> templates (-t CLI option) in a separate directory, which can be
>>>>>>> 
>>>>>>> associated
>>>>>>> 
>>>>>>> with the generator files base on name patterns, and even based on
>>>>>>> 
>>>>>>> content
>>>>>>> 
>>>>>>> (schema usually). If the generator file is a template (so that's a
>>>>>>> positional @Parameter CLI argument that happens to be an *.ftl, and
>>>>>>> is
>>>>>>> 
>>>>>>> not
>>>>>>> 
>>>>>>> a template file specified after the "-t" option), then you just
>>>>>>> Template.process(...) it, and it prints what the output will be.
>>>>>>> 2. You also have a set of variables, the global data-model, that
>>>>>>> contains commonly useful stuff, like what you now call parameters
>>>>>>> (CLI
>>>>>>> -Pname=value), but also maybe data loaded from JSON, XML, etc.. Those
>>>>>>> 
>>>>>>> data
>>>>>>> 
>>>>>>> files aren't "generator files". Templates just use them if they need
>>>>>>> 
>>>>>>> them.
>>>>>>> 
>>>>>>> An important thing here is to reuse the same mechanism to read and
>>>>>>> 
>>>>>>> parse
>>>>>>> 
>>>>>>> those data files, which was used in templates when transforming
>>>>>>> 
>>>>>>> generator
>>>>>>> 
>>>>>>> files. So we need a common format for specifying how to load data
>>>>>>> 
>>>>>>> files.
>>>>>>> 
>>>>>>> That's maybe just FTL that #assigns to the variables, or maybe more
>>>>>>> declarative format.
>>>>>>> 
>>>>>>> What I have described in the original post here was a less generic
>>>>>>> form
>>>>>>> 
>>>>>>> of
>>>>>>> 
>>>>>>> this, as I tried to be true with the original approach. I though the
>>>>>>> proposal will be drastic enough as it is... :) There, the "main"
>>>>>>> document
>>>>>>> is the "generator file" from point 1, the "-t" template is the
>>>>>>> transform
>>>>>>> template for the "main" document, and the other named documents
>>>>>>> ("users",
>>>>>>> "groups") is a poor man's shared data-model from point 2 (together
>>>>>>> with
>>>>>>> with -PName=value).
>>>>>>> 
>>>>>>> There's further somewhat confusing thing to get right with the
>>>>>>> list-of-documents (`DocuentList`, `NamedDocumentLists`) thing though.
>>>>>>> In
>>>>>>> the model above, as per point 1, if you list multiple data files,
>>>>>>> each
>>>>>>> 
>>>>>>> will
>>>>>>> 
>>>>>>> generate a separate output file. So, if you need take in a list of
>>>>>>> files
>>>>>>> 
>>>>>>> to
>>>>>>> 
>>>>>>> transform it to a single output file (or at least with a single
>>>>>>> transform
>>>>>>> template execution), then you have to be explicit about that, as
>>>>>>> that's
>>>>>>> 
>>>>>>> not
>>>>>>> 
>>>>>>> the default behavior anymore. But it's still absolutely possible.
>>>>>>> Imagine
>>>>>>> it as a "list of XLSX-es" is itself like a file format. You need some
>>>>>>> CLI
>>>>>>> (and Maven config, etc.) syntax to express that, but that shouldn't
>>>>>>> be a
>>>>>>> big deal.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Sun, Feb 23, 2020 at 9:43 PM Siegfried Goeschl <
>>>>>>> siegfried.goeschl@gmail.com> wrote:
>>>>>>> 
>>>>>>> Hi Daniel,
>>>>>>> 
>>>>>>> Good timing - I was looking at a similar problem from different angle
>>>>>>> yesterday (see below)
>>>>>>> 
>>>>>>> Don't have enough time to answer your email in detail now - will do
>>>>>>> that
>>>>>>> tomorrow evening
>>>>>>> 
>>>>>>> Thanks in advance,
>>>>>>> 
>>>>>>> Siegfried Goeschl
>>>>>>> 
>>>>>>> 
>>>>>>> ===. START
>>>>>>> # FreeMarker CLI Improvement
>>>>>>> ## Support Of Multiple Template Files
>>>>>>> Currently we support the following combinations
>>>>>>> 
>>>>>>> * Single template and no data files
>>>>>>> * Single template and one or more data files
>>>>>>> 
>>>>>>> But we can not support the following use case which is quite typical
>>>>>>> in
>>>>>>> the cloud
>>>>>>> 
>>>>>>> __Convert multiple templates with a single data file, e.g copying a
>>>>>>> directory of configuration files using a JSON configuration file__
>>>>>>> 
>>>>>>> ## Implementation notes
>>>>>>> * When we copy a directory we can remove the `ftl`extension on the
>>>>>>> fly
>>>>>>> * We might need an `exclude` filter for the copy operation
>>>>>>> * Initially resolve to a list of template files and process one after
>>>>>>> another
>>>>>>> * Need to calculate the output file location and extension
>>>>>>> * We need to rename the existing command line parameters (see below)
>>>>>>> * Do we need multiple include and exclude filter?
>>>>>>> * Do we need file versus directory filters?
>>>>>>> 
>>>>>>> ### Command Line Options
>>>>>>> ```
>>>>>>> --input-encoding : Encoding of the documents
>>>>>>> --output-encoding : Encoding of the rendered template
>>>>>>> --template-encoding : Encoding of the template
>>>>>>> --output : Output file or directory
>>>>>>> --include-document : Include pattern for documents
>>>>>>> --exclude-document : Exclude pattern for documents
>>>>>>> --include-template: Include pattern for templates
>>>>>>> --exclude-template : Exclude pattern for templates
>>>>>>> ```
>>>>>>> 
>>>>>>> ### Command Line Examples
>>>>>>> ```text
>>>>>>> # Copy all FTL templates found in "ext/config" to the "/config"
>>>>>>> 
>>>>>>> directory
>>>>>>> 
>>>>>>> using the data from "config.json"
>>>>>>> 
>>>>>>> freemarker-cli -t ./ext/config --include-template *.ftl --o /config
>>>>>>> 
>>>>>>> config.json
>>>>>>> 
>>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>>>>>>> 
>>>>>>> --output
>>>>>>> 
>>>>>>> /config config.json
>>>>>>> 
>>>>>>> # Bascically the same using a named document "configuration"
>>>>>>> # It might make sense to expose "conf" directly in the FreeMarker
>>>>>>> data
>>>>>>> model
>>>>>>> # It might make sens to allow URIs for loading documents
>>>>>>> 
>>>>>>> freemarker-cli -t ./ext/config/*.ftl -o /config -d
>>>>>>> 
>>>>>>> configuration=config.json
>>>>>>> 
>>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>>>>>>> 
>>>>>>> --output
>>>>>>> 
>>>>>>> /config --document configuration=config.json
>>>>>>> 
>>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>>>>>>> 
>>>>>>> --output
>>>>>>> 
>>>>>>> /config --document configuration=file:///config.json
>>>>>>> 
>>>>>>> # Bascically the same using an environment variable as named document
>>>>>>> 
>>>>>>> freemarker-cli -t ./ext/config --include-template *.ftl -o /config -d
>>>>>>> 
>>>>>>> configuration=env:///CONFIGURATION
>>>>>>> 
>>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>>>>>>> 
>>>>>>> --output
>>>>>>> 
>>>>>>> /config --document configuration=env:///CONFIGURATION
>>>>>>> ```
>>>>>>> === END
>>>>>>> 
>>>>>>> On 23.02.2020, at 16:37, Daniel Dekany <dd...@apache.org> wrote:
>>>>>>> 
>>>>>>> Input documents is a fundamental concept in freemarker-generator, so
>>>>>>> we
>>>>>>> should think about that more, and probably refine/rework how it's
>>>>>>> done.
>>>>>>> 
>>>>>>> Currently it works like this, with CLI at least.
>>>>>>> 
>>>>>>> freemarker-cli
>>>>>>> -t access-report.ftl
>>>>>>> somewhere/foo-access-log.csv
>>>>>>> 
>>>>>>> Then in access-report.ftl you have to do something like this:
>>>>>>> 
>>>>>>> <#assign doc = Documents.get(0)>
>>>>>>> ... process doc here
>>>>>>> 
>>>>>>> (The more idiomatic Documents[0] won't work. Actually, that lead to a
>>>>>>> 
>>>>>>> funny
>>>>>>> 
>>>>>>> chain of coincidences: It returned the string "D", then
>>>>>>> 
>>>>>>> CSVTool.parse(...)
>>>>>>> 
>>>>>>> happily parsed that to a table with the single column "D", and 0
>>>>>>> rows,
>>>>>>> 
>>>>>>> and
>>>>>>> 
>>>>>>> as there were 0 rows, the template didn't run into an error because
>>>>>>> row.myExpectedColumn refers to a missing column either, so the
>>>>>>> process
>>>>>>> finished with success. (: Pretty unlucky for sure. The root was
>>>>>>> unintentionally breaking a FreeMarker idiom though; eventually we
>>>>>>> will
>>>>>>> 
>>>>>>> have
>>>>>>> 
>>>>>>> to work on those too, but, different topic.)
>>>>>>> 
>>>>>>> However, actually multiple input documents can be passed in:
>>>>>>> 
>>>>>>> freemarker-cli
>>>>>>> -t access-report.ftl
>>>>>>> somewhere/foo-access-log.csv
>>>>>>> somewhere/bar-access-log.csv
>>>>>>> 
>>>>>>> Above template will still work, though then you ignored all but the
>>>>>>> 
>>>>>>> first
>>>>>>> 
>>>>>>> document. So if you expect any number of input documents, you
>>>>>>> probably
>>>>>>> 
>>>>>>> will
>>>>>>> 
>>>>>>> have to do this:
>>>>>>> 
>>>>>>> <#list Documents.list as doc>
>>>>>>> ... process doc here
>>>>>>> </#list>
>>>>>>> 
>>>>>>> (The more idiomatic <#list Documents as doc> won't work; but again,
>>>>>>> 
>>>>>>> those
>>>>>>> 
>>>>>>> we will work out in a different thread.)
>>>>>>> 
>>>>>>> 
>>>>>>> So, what would be better, in my opinion. I start out from what I
>>>>>>> think
>>>>>>> 
>>>>>>> are
>>>>>>> 
>>>>>>> the common uses cases, in decreasing order of frequency. Goal is to
>>>>>>> 
>>>>>>> make
>>>>>>> 
>>>>>>> those less error prone for the users, and simpler to express.
>>>>>>> 
>>>>>>> USE CASE 1
>>>>>>> 
>>>>>>> You have exactly 1 input documents, which is therefore simply "the"
>>>>>>> document in the mind of the user. This is probably the typical use
>>>>>>> 
>>>>>>> case,
>>>>>>> 
>>>>>>> but at least the use case users typically start out from when
>>>>>>> starting
>>>>>>> 
>>>>>>> the
>>>>>>> 
>>>>>>> work.
>>>>>>> 
>>>>>>> freemarker-cli
>>>>>>> -t access-report.ftl
>>>>>>> somewhere/foo-access-log.csv
>>>>>>> 
>>>>>>> Then `Documents.get(0)` is not very fitting. Most importantly it's
>>>>>>> 
>>>>>>> error
>>>>>>> 
>>>>>>> prone, because if the user passed in more than 1 documents (can even
>>>>>>> 
>>>>>>> happen
>>>>>>> 
>>>>>>> totally accidentally, like if the user was lazy and used a wildcard
>>>>>>> 
>>>>>>> that
>>>>>>> 
>>>>>>> the shell exploded), the template will silently ignore the rest of
>>>>>>> the
>>>>>>> documents, and the singe document processed will be practically
>>>>>>> picked
>>>>>>> randomly. The user might won't notice that and submits a bad report
>>>>>>> or
>>>>>>> 
>>>>>>> such.
>>>>>>> 
>>>>>>> I think that in this use case the document should be simply referred
>>>>>>> as
>>>>>>> `Document` in the template. When you have multiple documents there,
>>>>>>> referring to `Document` should be an error, saying that the template
>>>>>>> 
>>>>>>> was
>>>>>>> 
>>>>>>> made to process a single document only.
>>>>>>> 
>>>>>>> 
>>>>>>> USE CASE 2
>>>>>>> 
>>>>>>> You have multiple input documents, but each has different role
>>>>>>> 
>>>>>>> (different
>>>>>>> 
>>>>>>> schema, maybe different file type). Like, you pass in users.csv and
>>>>>>> groups.csv. Each has difference schema, and so you want to access
>>>>>>> them
>>>>>>> differently, but in the same template.
>>>>>>> 
>>>>>>> freemarker-cli
>>>>>>> [...]
>>>>>>> --named-document users somewhere/foo-users.csv
>>>>>>> --named-document groups somewhere/foo-groups.csv
>>>>>>> 
>>>>>>> Then in the template you could refer to them as:
>>>>>>> 
>>>>>>> `NamedDocuments.users`,
>>>>>>> 
>>>>>>> and `NamedDocuments.groups`.
>>>>>>> 
>>>>>>> Use Case 1, and 2 can be unified into a coherent concept, where
>>>>>>> 
>>>>>>> `Document`
>>>>>>> 
>>>>>>> is just a shorthand for `NamedDocuments.main`. It's called "main"
>>>>>>> 
>>>>>>> because
>>>>>>> 
>>>>>>> that's "the" document the template is about, but then you have to
>>>>>>> added
>>>>>>> some helper documents, with symbolic names representing their role.
>>>>>>> 
>>>>>>> freemarker-cli
>>>>>>> -t access-report.ftl
>>>>>>> --document-name=main somewhere/foo-access-log.csv
>>>>>>> --document-name=users somewhere/foo-users.csv
>>>>>>> --document-name=groups somewhere/foo-groups.csv
>>>>>>> 
>>>>>>> Here, `Document` still works in the template, and it refers to
>>>>>>> `somewhere/foo-access-log.csv`. (While omitting --document-name=main
>>>>>>> 
>>>>>>> above
>>>>>>> 
>>>>>>> would be cleaner, I couldn't figure out how to do that with Picocli.
>>>>>>> Anyway, for now the point is the concept, which is not specific to
>>>>>>> 
>>>>>>> CLI.)
>>>>>>> 
>>>>>>> USE CASE 3
>>>>>>> 
>>>>>>> Here you have several of the same kind of documents. That has a more
>>>>>>> generic sub-use-case, when you have explicitly named documents (like
>>>>>>> "users" above), and for some you expect multiple input files.
>>>>>>> 
>>>>>>> freemarker-cli
>>>>>>> -t access-report.ftl
>>>>>>> --document-name=main somewhere/foo-access-log.csv
>>>>>>> somewhere/bar-access-log.csv
>>>>>>> --document-name=users somewhere/foo-users.csv
>>>>>>> somewhere/bar-users.csv
>>>>>>> --document-name=groups somewhere/global-groups.csv
>>>>>>> 
>>>>>>> The template must to be written with this use case in mind, as now it
>>>>>>> 
>>>>>>> has
>>>>>>> 
>>>>>>> #list some of the documents. (I think in practice you hardly ever
>>>>>>> want
>>>>>>> 
>>>>>>> to
>>>>>>> 
>>>>>>> get a document by hard coded index. Either you don't know how many
>>>>>>> documents you have, so you can't use hard coded indexes, or you do,
>>>>>>> and
>>>>>>> each index has a specific meaning, but then you should name the
>>>>>>> 
>>>>>>> documents
>>>>>>> 
>>>>>>> instead, as using indexes is error prone, and hard to read.)
>>>>>>> Accessing that list of documents in the template, maybe could be done
>>>>>>> 
>>>>>>> like
>>>>>>> 
>>>>>>> this:
>>>>>>> - For the "main" documents: `DocumentList`
>>>>>>> - For explicitly named documents, like "users":
>>>>>>> 
>>>>>>> `NamedDocumentLists.users`
>>>>>>> 
>>>>>>> SUMMING UP
>>>>>>> 
>>>>>>> To unify all 3 use cases into a coherent concept:
>>>>>>> - `NamedDocumentLists.<name>` is the most generic form, and while you
>>>>>>> 
>>>>>>> can
>>>>>>> 
>>>>>>> achieve everything with it, using it requires your template to handle
>>>>>>> 
>>>>>>> the
>>>>>>> 
>>>>>>> most generic case too. So, I think it would be rarely used.
>>>>>>> - `DocumentList` is just a shorthand for `NamedDocumentLists.main`.
>>>>>>> 
>>>>>>> It's
>>>>>>> 
>>>>>>> used if you only have one kind of documents (single format and
>>>>>>> schema),
>>>>>>> 
>>>>>>> but
>>>>>>> 
>>>>>>> potentially multiple of them.
>>>>>>> - `NamedDocuments.<name>` expresses that you expect exactly 1
>>>>>>> document
>>>>>>> 
>>>>>>> of
>>>>>>> 
>>>>>>> the given name.
>>>>>>> - `Document` is just a shorthand for `NamedDocuments.main`. This is
>>>>>>> for
>>>>>>> 
>>>>>>> the
>>>>>>> 
>>>>>>> most natural/frequent use case.
>>>>>>> 
>>>>>>> That's 4 possible ways of accessing your documents, which is a
>>>>>>> 
>>>>>>> trade-off
>>>>>>> 
>>>>>>> for the sake of these:
>>>>>>> - Catching CLI (or Maven, etc.) input where the template output
>>>>>>> likely
>>>>>>> 
>>>>>>> will
>>>>>>> 
>>>>>>> be wrong. That's only possible if the user can communicate its intent
>>>>>>> 
>>>>>>> in
>>>>>>> 
>>>>>>> the template.
>>>>>>> - Users don't need to deal with concepts that are irrelevant in their
>>>>>>> concrete use case. Just start with the trivial, `Document`, and later
>>>>>>> 
>>>>>>> if
>>>>>>> 
>>>>>>> the need arises, generalize to named documents, document lists, or
>>>>>>> 
>>>>>>> both.
>>>>>>> 
>>>>>>> What do guys think?
>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Best regards,
>>>> Daniel Dekany
>>>> 
>>> 
>>> 
>>> --
>>> Best regards,
>>> Daniel Dekany
>> 
>> 
> 
> -- 
> Best regards,
> Daniel Dekany

Re: freemarker-generator: Improving the input documents concept

Posted by Daniel Dekany <da...@gmail.com>.

I believe that should be DataSource (with capital S), as it's two words.

Also, it's the name of a too widely used and known JDBC interface. So if
anyone can tell a similarly descriptive alternative...

On Sat, Feb 29, 2020 at 5:03 PM Siegfried Goeschl <
siegfried.goeschl@gmail.com> wrote:

> Hi Daniel,
>
> I'm an enterprise developer - bad habits die hard :-)
>
> So I closed the following tickets and merged the branches
>
> 1) FREEMARKER-129 freemarker-generator: Merge "freemarker-cli" into
> "freemarker-generator"
> 2) FREEMARKER-134 freemarker-generator: Rename "Document" to "Datasource"
> 3) FREEMARKER-135 freemarker-generator-cli: Support user-supplied names
> for datasources
>
> Thanks in advance,
>
> Siegfried Goeschl
>
>
> > On 29.02.2020, at 12:19, Daniel Dekany <da...@gmail.com> wrote:
> >
> > Yeah, and of course, you can merge that branch. You can even work on the
> > master directly after all.
> >
> > On Sat, Feb 29, 2020 at 12:17 PM Daniel Dekany <da...@gmail.com>
> > wrote:
> >
> >> But, I do recognize the cattle use case (several "faceless" files with
> >> common format/schema). Only, my idea is to push that complexity on the
> data
> >> source. The "data source" concept shields the rest of the application
> from
> >> the details of how the data is stored or retrieved. So, a data source
> might
> >> loads a bunch of log files from a directory, and present them as a
> single
> >> big table, or like a list of tables, etc. So I want to deal with the
> cattle
> >> use case, but the question is what part of the of architecture will deal
> >> with this complication, with other words, how do you box things. Why my
> >> initial bet is to stuff that complication into the "data source"
> >> implementation(s) is that data sources are inherently varied. Some
> returns
> >> a table-like thing, some have multiple named tables (worksheets in
> Excel),
> >> some returns tree of nodes (XML), etc. So then, some might returns a
> >> list-of-list-of log records, or just a single list of log-records (put
> >> together from daily log files). That way cattles don't add to conceptual
> >> complexity. Now, you might be aware of cases where the cattle concept
> must
> >> be more exposed than this, and the we can't box things like this. But
> this
> >> is what I tried to express.
> >>
> >> Regarding "output generators", and how that applies on the command
> line. I
> >> think it's important that the common core between Maven and
> command-line is
> >> as fat as possible. Ideally, they are just two syntax to set up the same
> >> thing. Mostly at least. So, if you specify a template file to the CLI
> >> application, in a way so that it causes it to process that template to
> >> generate a single output, then there you have just defined an "output
> >> generator" (even if it wasn't explicitly called like that in the command
> >> line). If you specify 3 csv files to the CLI application, in a way so
> that
> >> it causes it to generate 3 output files, then you have just defined 3
> >> "output generators" there (there's at least one template specified there
> >> too, but that wasn't an "output generator" itself, it was just an
> attribute
> >> of the 3 output generators). If you specify 1 template, and 3 csv
> files, in
> >> a way so that it will yield 4 output files (1 for the template, 3 for
> the
> >> csv-s), then you have defined 4 output generators there. If you have a
> data
> >> source that loads a list of 3 entities (say, 3 csv files, so it's a
> list of
> >> tables then), and you have 2 templates, and you tell the CLI to execute
> >> each template for each item in said data source, then you have just
> defined
> >> 6 "output generators".
> >>
> >> On Fri, Feb 28, 2020 at 11:08 AM Siegfried Goeschl <
> >> siegfried.goeschl@gmail.com> wrote:
> >>
> >>> Hi Daniel,
> >>>
> >>> That all depends on your mental model and work you do, expectations,
> >>> experience :-)
> >>>
> >>>
> >>> __Document Handling__
> >>>
> >>> *"But I think actually we have no good use case for list of documents
> >>> that's passed at once to a single template run, so, we can just ignore
> >>> that complication"*
> >>>
> >>> In my case that's not a complication but my daily business - I'm
> >>> regularly wading through access logs - yesterday probably a couple of
> >>> hundreds access logs across two staging sites to help tracking some
> >>> strange API gateway issues :-)
> >>>
> >>> My gut feeling is (borrowing from
> >>>
> >>>
> https://medium.com/@Joachim8675309/devops-concepts-pets-vs-cattle-2380b5aab313
> >>> )
> >>>
> >>> 1. You have a few lovely named documents / templates - `pets`
> >>> 2. You have tons of anonymous documents / templates to process -
> >>> `cattle`
> >>> 3. The "grey area" comes into play when mixing `pets & cattle`
> >>>
> >>> `freemarker-cli` was built with 2) in mind and I want to cover 1) since
> >>> it is equally important and common.
> >>>
> >>>
> >>> __Template And Document Processing Modes__
> >>>
> >>> IMHO it is important to answer the following question : "How many
> >>> outputs do you get when rendering 2 template and 3 datasources? Two,
> >>> Three or Six?"
> >>>
> >>> Your answer is influenced by your mental model / experience
> >>>
> >>> * When wading through tons of CSV files, access logs, etc. the answer
> is
> >>> "2"
> >>> * When doing source code generation the obvious answer is "6"
> >>> * Can't image a use case which results in "3" but I'm pretty sure we
> >>> will encounter one
> >>>
> >>> __Template and document mode probably shouldn't exist__
> >>>
> >>> That's hard for me to fully understand - I definitely lack your
> insights
> >>> & experience writing such tools :-)
> >>>
> >>> Defining the `Output Generator` is the underlying model for the Maven
> >>> plugin (and probably FMPP).
> >>>
> >>> I'm not sure if this applies for command lines at least not in the way
> I
> >>> use them (or would like to use them)
> >>>
> >>>
> >>> Thanks in advance,
> >>>
> >>> Siegfried Goeschl
> >>>
> >>> PS: Can/shall I merge the PR to bring in `freemarker-cli`?
> >>>
> >>>
> >>> On 28 Feb 2020, at 9:14, Daniel Dekany wrote:
> >>>
> >>>> Yeah, "data source" is surely a too popular name, but for reason.
> >>>> Anyone
> >>>> has other ideas?
> >>>>
> >>>> As of naming data sources and such. One thing I was wondering about
> >>>> back
> >>>> then is how to deal with list of documents given to a template, versus
> >>>> exactly 1 document given to a template. But I think actually we have
> >>>> no
> >>>> good use case for list of documents that's passed at once to a single
> >>>> template run, so, we can just ignore that complication. A document has
> >>>> a
> >>>> name, and that's always just a single document, not a collection, as
> >>>> far as
> >>>> the template is concerned. (We can have multiple documents per run,
> >>>> but
> >>>> those normally yield separate output generators, so it's still only
> >>>> one
> >>>> document per template.) However, we can have data source types
> >>>> (document
> >>>> types with old terminology) that collect together multiple data files.
> >>>> So
> >>>> then that complexity is encapsulated into the data source type, and
> >>>> doesn't
> >>>> complicate the overall architecture. That's another case when a data
> >>>> source
> >>>> is not just a file. Like maybe there's a data source type that loads
> >>>> all
> >>>> the CSV-s from a directory, into a single big table (I had such case),
> >>>> or
> >>>> even into a list of tables. Or, as I mentioned already, a data source
> >>>> is
> >>>> maybe an SQL query on a JDBC data source (and we got the first term
> >>>> clash... JDBC also call them data sources).
> >>>>
> >>>> Template and document mode probably shouldn't exist from user
> >>>> perspective
> >>>> either, at least not as a global option that must apply to everything
> >>>> in a
> >>>> run. They could just give the files that define the "output
> >>>> generators",
> >>>> and some of them will be templates, some of them are data files, in
> >>>> which
> >>>> case a template need to be associated with them (and there can be a
> >>>> couple
> >>>> of ways of doing that). And then again, there are the cases where you
> >>>> want
> >>>> to create one output generator per entity from some data source.
> >>>>
> >>>> On Fri, Feb 28, 2020 at 8:23 AM Siegfried Goeschl <
> >>>> siegfried.goeschl@gmail.com> wrote:
> >>>>
> >>>>> Hi Daniel,
> >>>>>
> >>>>> See my comments below - and thanks for your patience and input :-)
> >>>>>
> >>>>> *Renaming Document To DataSource*
> >>>>>
> >>>>> Yes, makes sense. I tried to avoid since I'm using javax.activation
> >>>>> and
> >>>>> its DataSource.
> >>>>>
> >>>>> *Template And Document Mode*
> >>>>>
> >>>>> Agreed - I think it is a valuable abstraction for the user but it is
> >>>>> not
> >>>>> an implementation concept :-)
> >>>>>
> >>>>> *Document Without Symbolic Names*
> >>>>>
> >>>>> Also agreed and it is going to change but I have not settled my mind
> >>>>> yet
> >>>>> what exactly to implement.
> >>>>>
> >>>>> Thanks in advance,
> >>>>>
> >>>>> Siegfried Goeschl
> >>>>>
> >>>>> On 28 Feb 2020, at 1:05, Daniel Dekany wrote:
> >>>>>
> >>>>> A few quick thoughts on that:
> >>>>>
> >>>>> - We should replace the "document" term with something more speaking.
> >>>>> It
> >>>>> doesn't tell that it's some kind of input. Also, most of these inputs
> >>>>> aren't something that people typically call documents. Like a csv
> >>>>> file, or
> >>>>> a database table, which is not even a file (OK we don't support such
> >>>>> thing
> >>>>> at the moment). I think, maybe "data source" is a safe enough term.
> >>>>> (It
> >>>>> also rhymes with data model.)
> >>>>> - You have separate "template" and "document" "mode", that applies to
> >>>>> a
> >>>>> whole run. I think such specialization won't be helpful. We could
> >>>>> just say,
> >>>>> on the conceptual level at lest, that we need a set of "outputs
> >>>>> generators". An output generator is an object (in the API) that
> >>>>> specifies a
> >>>>> template, a data-model (where the data-model is possibly populated
> >>>>> with
> >>>>> "documents"), and an output "sink" (a file path, or stdout), and can
> >>>>> generate the output itself. A practical way of defining the output
> >>>>> generators in a CLI application is via a bunch of files, each
> >>>>> defining an
> >>>>> output generator. Some of those files is maybe a template (that you
> >>>>> can
> >>>>> even detect from the file extension), or a data file that we
> >>>>> currently call
> >>>>> a "document". They could freely mix inside the same run. I have also
> >>>>> met
> >>>>> use case when you have a single table (single "document"), and each
> >>>>> record
> >>>>> in it yields an output file. That can also be described in some file
> >>>>> format, or really in any other way, like directly in command line
> >>>>> argument,
> >>>>> via API, etc.
> >>>>> - You have multiple documents without associated symbolical name in
> >>>>> some
> >>>>> examples. Templates can't identify those then in a well maintainable
> >>>>> way.
> >>>>> The actual file name is often not a good identifier, can change over
> >>>>> time,
> >>>>> and you might don't even have good control over it, like you already
> >>>>> receive it as a parameter from somewhere else, or someone
> >>>>> moves/renames
> >>>>> that files that you need to read. Index is also not very good, but I
> >>>>> have
> >>>>> written about that earlier.
> >>>>>
> >>>>>
> >>>>> On Wed, Feb 26, 2020 at 9:33 AM Siegfried Goeschl <
> >>>>> siegfried.goeschl@gmail.com> wrote:
> >>>>>
> >>>>> Hi folks,
> >>>>>
> >>>>> still wrapping my side around but assembled some thoughts here -
> >>>>> https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449
> >>>>>
> >>>>> Thanks in advance,
> >>>>>
> >>>>> Siegfried Goeschl
> >>>>>
> >>>>>
> >>>>> On 23 Feb 2020, at 23:14, Daniel Dekany <dd...@apache.org> wrote:
> >>>>>
> >>>>> What you are describing is more like the angle that FMPP took
> >>>>> initially,
> >>>>> where templates drive things, they generate the output for themselves
> >>>>>
> >>>>> (even
> >>>>>
> >>>>> multiple output files if they wish). By default output files name
> >>>>> (and
> >>>>> relative path) is deduced from template name. There was also a global
> >>>>> data-model, built in a configuration file (or equally, built via
> >>>>> command
> >>>>> line arguments, or both mixed), from which templates get whatever
> >>>>> data
> >>>>>
> >>>>> they
> >>>>>
> >>>>> are interested in. Take a look at the figures here:
> >>>>> http://fmpp.sourceforge.net/qtour.html. Later, this concept was
> >>>>>
> >>>>> generalized
> >>>>>
> >>>>> a bit more, because you could add XML files at the same place where
> >>>>> you
> >>>>> have the templates, and then you could associate transform templates
> >>>>> to
> >>>>>
> >>>>> the
> >>>>>
> >>>>> XML files (based on path pattern and/or the XML document element).
> >>>>> Now
> >>>>> that's like what freemarker-generator had initially (data files drive
> >>>>> output, and the template is there to transform it).
> >>>>>
> >>>>> So I think the generic mental model would like this:
> >>>>>
> >>>>> 1. You got files that drive the process, let's call them *generator
> >>>>> files* for now. Usually, each generator file yields an output file
> >>>>> (but
> >>>>> maybe even multiple output files, as you might saw in the last
> >>>>> figure).
> >>>>> These generator files can be of many types, like XML, JSON, XLSX (as
> >>>>>
> >>>>> in the
> >>>>>
> >>>>> original freemarker-generator), and even templates (as is the norm in
> >>>>> FMPP). If the file is not a template, then you got a set of
> >>>>> transformer
> >>>>> templates (-t CLI option) in a separate directory, which can be
> >>>>>
> >>>>> associated
> >>>>>
> >>>>> with the generator files base on name patterns, and even based on
> >>>>>
> >>>>> content
> >>>>>
> >>>>> (schema usually). If the generator file is a template (so that's a
> >>>>> positional @Parameter CLI argument that happens to be an *.ftl, and
> >>>>> is
> >>>>>
> >>>>> not
> >>>>>
> >>>>> a template file specified after the "-t" option), then you just
> >>>>> Template.process(...) it, and it prints what the output will be.
> >>>>> 2. You also have a set of variables, the global data-model, that
> >>>>> contains commonly useful stuff, like what you now call parameters
> >>>>> (CLI
> >>>>> -Pname=value), but also maybe data loaded from JSON, XML, etc.. Those
> >>>>>
> >>>>> data
> >>>>>
> >>>>> files aren't "generator files". Templates just use them if they need
> >>>>>
> >>>>> them.
> >>>>>
> >>>>> An important thing here is to reuse the same mechanism to read and
> >>>>>
> >>>>> parse
> >>>>>
> >>>>> those data files, which was used in templates when transforming
> >>>>>
> >>>>> generator
> >>>>>
> >>>>> files. So we need a common format for specifying how to load data
> >>>>>
> >>>>> files.
> >>>>>
> >>>>> That's maybe just FTL that #assigns to the variables, or maybe more
> >>>>> declarative format.
> >>>>>
> >>>>> What I have described in the original post here was a less generic
> >>>>> form
> >>>>>
> >>>>> of
> >>>>>
> >>>>> this, as I tried to be true with the original approach. I though the
> >>>>> proposal will be drastic enough as it is... :) There, the "main"
> >>>>> document
> >>>>> is the "generator file" from point 1, the "-t" template is the
> >>>>> transform
> >>>>> template for the "main" document, and the other named documents
> >>>>> ("users",
> >>>>> "groups") is a poor man's shared data-model from point 2 (together
> >>>>> with
> >>>>> with -PName=value).
> >>>>>
> >>>>> There's further somewhat confusing thing to get right with the
> >>>>> list-of-documents (`DocuentList`, `NamedDocumentLists`) thing though.
> >>>>> In
> >>>>> the model above, as per point 1, if you list multiple data files,
> >>>>> each
> >>>>>
> >>>>> will
> >>>>>
> >>>>> generate a separate output file. So, if you need take in a list of
> >>>>> files
> >>>>>
> >>>>> to
> >>>>>
> >>>>> transform it to a single output file (or at least with a single
> >>>>> transform
> >>>>> template execution), then you have to be explicit about that, as
> >>>>> that's
> >>>>>
> >>>>> not
> >>>>>
> >>>>> the default behavior anymore. But it's still absolutely possible.
> >>>>> Imagine
> >>>>> it as a "list of XLSX-es" is itself like a file format. You need some
> >>>>> CLI
> >>>>> (and Maven config, etc.) syntax to express that, but that shouldn't
> >>>>> be a
> >>>>> big deal.
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Sun, Feb 23, 2020 at 9:43 PM Siegfried Goeschl <
> >>>>> siegfried.goeschl@gmail.com> wrote:
> >>>>>
> >>>>> Hi Daniel,
> >>>>>
> >>>>> Good timing - I was looking at a similar problem from different angle
> >>>>> yesterday (see below)
> >>>>>
> >>>>> Don't have enough time to answer your email in detail now - will do
> >>>>> that
> >>>>> tomorrow evening
> >>>>>
> >>>>> Thanks in advance,
> >>>>>
> >>>>> Siegfried Goeschl
> >>>>>
> >>>>>
> >>>>> ===. START
> >>>>> # FreeMarker CLI Improvement
> >>>>> ## Support Of Multiple Template Files
> >>>>> Currently we support the following combinations
> >>>>>
> >>>>> * Single template and no data files
> >>>>> * Single template and one or more data files
> >>>>>
> >>>>> But we can not support the following use case which is quite typical
> >>>>> in
> >>>>> the cloud
> >>>>>
> >>>>> __Convert multiple templates with a single data file, e.g copying a
> >>>>> directory of configuration files using a JSON configuration file__
> >>>>>
> >>>>> ## Implementation notes
> >>>>> * When we copy a directory we can remove the `ftl`extension on the
> >>>>> fly
> >>>>> * We might need an `exclude` filter for the copy operation
> >>>>> * Initially resolve to a list of template files and process one after
> >>>>> another
> >>>>> * Need to calculate the output file location and extension
> >>>>> * We need to rename the existing command line parameters (see below)
> >>>>> * Do we need multiple include and exclude filter?
> >>>>> * Do we need file versus directory filters?
> >>>>>
> >>>>> ### Command Line Options
> >>>>> ```
> >>>>> --input-encoding : Encoding of the documents
> >>>>> --output-encoding : Encoding of the rendered template
> >>>>> --template-encoding : Encoding of the template
> >>>>> --output : Output file or directory
> >>>>> --include-document : Include pattern for documents
> >>>>> --exclude-document : Exclude pattern for documents
> >>>>> --include-template: Include pattern for templates
> >>>>> --exclude-template : Exclude pattern for templates
> >>>>> ```
> >>>>>
> >>>>> ### Command Line Examples
> >>>>> ```text
> >>>>> # Copy all FTL templates found in "ext/config" to the "/config"
> >>>>>
> >>>>> directory
> >>>>>
> >>>>> using the data from "config.json"
> >>>>>
> >>>>> freemarker-cli -t ./ext/config --include-template *.ftl --o /config
> >>>>>
> >>>>> config.json
> >>>>>
> >>>>> freemarker-cli --template ./ext/config --include-template *.ftl
> >>>>>
> >>>>> --output
> >>>>>
> >>>>> /config config.json
> >>>>>
> >>>>> # Bascically the same using a named document "configuration"
> >>>>> # It might make sense to expose "conf" directly in the FreeMarker
> >>>>> data
> >>>>> model
> >>>>> # It might make sens to allow URIs for loading documents
> >>>>>
> >>>>> freemarker-cli -t ./ext/config/*.ftl -o /config -d
> >>>>>
> >>>>> configuration=config.json
> >>>>>
> >>>>> freemarker-cli --template ./ext/config --include-template *.ftl
> >>>>>
> >>>>> --output
> >>>>>
> >>>>> /config --document configuration=config.json
> >>>>>
> >>>>> freemarker-cli --template ./ext/config --include-template *.ftl
> >>>>>
> >>>>> --output
> >>>>>
> >>>>> /config --document configuration=file:///config.json
> >>>>>
> >>>>> # Bascically the same using an environment variable as named document
> >>>>>
> >>>>> freemarker-cli -t ./ext/config --include-template *.ftl -o /config -d
> >>>>>
> >>>>> configuration=env:///CONFIGURATION
> >>>>>
> >>>>> freemarker-cli --template ./ext/config --include-template *.ftl
> >>>>>
> >>>>> --output
> >>>>>
> >>>>> /config --document configuration=env:///CONFIGURATION
> >>>>> ```
> >>>>> === END
> >>>>>
> >>>>> On 23.02.2020, at 16:37, Daniel Dekany <dd...@apache.org> wrote:
> >>>>>
> >>>>> Input documents is a fundamental concept in freemarker-generator, so
> >>>>> we
> >>>>> should think about that more, and probably refine/rework how it's
> >>>>> done.
> >>>>>
> >>>>> Currently it works like this, with CLI at least.
> >>>>>
> >>>>> freemarker-cli
> >>>>> -t access-report.ftl
> >>>>> somewhere/foo-access-log.csv
> >>>>>
> >>>>> Then in access-report.ftl you have to do something like this:
> >>>>>
> >>>>> <#assign doc = Documents.get(0)>
> >>>>> ... process doc here
> >>>>>
> >>>>> (The more idiomatic Documents[0] won't work. Actually, that lead to a
> >>>>>
> >>>>> funny
> >>>>>
> >>>>> chain of coincidences: It returned the string "D", then
> >>>>>
> >>>>> CSVTool.parse(...)
> >>>>>
> >>>>> happily parsed that to a table with the single column "D", and 0
> >>>>> rows,
> >>>>>
> >>>>> and
> >>>>>
> >>>>> as there were 0 rows, the template didn't run into an error because
> >>>>> row.myExpectedColumn refers to a missing column either, so the
> >>>>> process
> >>>>> finished with success. (: Pretty unlucky for sure. The root was
> >>>>> unintentionally breaking a FreeMarker idiom though; eventually we
> >>>>> will
> >>>>>
> >>>>> have
> >>>>>
> >>>>> to work on those too, but, different topic.)
> >>>>>
> >>>>> However, actually multiple input documents can be passed in:
> >>>>>
> >>>>> freemarker-cli
> >>>>> -t access-report.ftl
> >>>>> somewhere/foo-access-log.csv
> >>>>> somewhere/bar-access-log.csv
> >>>>>
> >>>>> Above template will still work, though then you ignored all but the
> >>>>>
> >>>>> first
> >>>>>
> >>>>> document. So if you expect any number of input documents, you
> >>>>> probably
> >>>>>
> >>>>> will
> >>>>>
> >>>>> have to do this:
> >>>>>
> >>>>> <#list Documents.list as doc>
> >>>>> ... process doc here
> >>>>> </#list>
> >>>>>
> >>>>> (The more idiomatic <#list Documents as doc> won't work; but again,
> >>>>>
> >>>>> those
> >>>>>
> >>>>> we will work out in a different thread.)
> >>>>>
> >>>>>
> >>>>> So, what would be better, in my opinion. I start out from what I
> >>>>> think
> >>>>>
> >>>>> are
> >>>>>
> >>>>> the common uses cases, in decreasing order of frequency. Goal is to
> >>>>>
> >>>>> make
> >>>>>
> >>>>> those less error prone for the users, and simpler to express.
> >>>>>
> >>>>> USE CASE 1
> >>>>>
> >>>>> You have exactly 1 input documents, which is therefore simply "the"
> >>>>> document in the mind of the user. This is probably the typical use
> >>>>>
> >>>>> case,
> >>>>>
> >>>>> but at least the use case users typically start out from when
> >>>>> starting
> >>>>>
> >>>>> the
> >>>>>
> >>>>> work.
> >>>>>
> >>>>> freemarker-cli
> >>>>> -t access-report.ftl
> >>>>> somewhere/foo-access-log.csv
> >>>>>
> >>>>> Then `Documents.get(0)` is not very fitting. Most importantly it's
> >>>>>
> >>>>> error
> >>>>>
> >>>>> prone, because if the user passed in more than 1 documents (can even
> >>>>>
> >>>>> happen
> >>>>>
> >>>>> totally accidentally, like if the user was lazy and used a wildcard
> >>>>>
> >>>>> that
> >>>>>
> >>>>> the shell exploded), the template will silently ignore the rest of
> >>>>> the
> >>>>> documents, and the singe document processed will be practically
> >>>>> picked
> >>>>> randomly. The user might won't notice that and submits a bad report
> >>>>> or
> >>>>>
> >>>>> such.
> >>>>>
> >>>>> I think that in this use case the document should be simply referred
> >>>>> as
> >>>>> `Document` in the template. When you have multiple documents there,
> >>>>> referring to `Document` should be an error, saying that the template
> >>>>>
> >>>>> was
> >>>>>
> >>>>> made to process a single document only.
> >>>>>
> >>>>>
> >>>>> USE CASE 2
> >>>>>
> >>>>> You have multiple input documents, but each has different role
> >>>>>
> >>>>> (different
> >>>>>
> >>>>> schema, maybe different file type). Like, you pass in users.csv and
> >>>>> groups.csv. Each has difference schema, and so you want to access
> >>>>> them
> >>>>> differently, but in the same template.
> >>>>>
> >>>>> freemarker-cli
> >>>>> [...]
> >>>>> --named-document users somewhere/foo-users.csv
> >>>>> --named-document groups somewhere/foo-groups.csv
> >>>>>
> >>>>> Then in the template you could refer to them as:
> >>>>>
> >>>>> `NamedDocuments.users`,
> >>>>>
> >>>>> and `NamedDocuments.groups`.
> >>>>>
> >>>>> Use Case 1, and 2 can be unified into a coherent concept, where
> >>>>>
> >>>>> `Document`
> >>>>>
> >>>>> is just a shorthand for `NamedDocuments.main`. It's called "main"
> >>>>>
> >>>>> because
> >>>>>
> >>>>> that's "the" document the template is about, but then you have to
> >>>>> added
> >>>>> some helper documents, with symbolic names representing their role.
> >>>>>
> >>>>> freemarker-cli
> >>>>> -t access-report.ftl
> >>>>> --document-name=main somewhere/foo-access-log.csv
> >>>>> --document-name=users somewhere/foo-users.csv
> >>>>> --document-name=groups somewhere/foo-groups.csv
> >>>>>
> >>>>> Here, `Document` still works in the template, and it refers to
> >>>>> `somewhere/foo-access-log.csv`. (While omitting --document-name=main
> >>>>>
> >>>>> above
> >>>>>
> >>>>> would be cleaner, I couldn't figure out how to do that with Picocli.
> >>>>> Anyway, for now the point is the concept, which is not specific to
> >>>>>
> >>>>> CLI.)
> >>>>>
> >>>>> USE CASE 3
> >>>>>
> >>>>> Here you have several of the same kind of documents. That has a more
> >>>>> generic sub-use-case, when you have explicitly named documents (like
> >>>>> "users" above), and for some you expect multiple input files.
> >>>>>
> >>>>> freemarker-cli
> >>>>> -t access-report.ftl
> >>>>> --document-name=main somewhere/foo-access-log.csv
> >>>>> somewhere/bar-access-log.csv
> >>>>> --document-name=users somewhere/foo-users.csv
> >>>>> somewhere/bar-users.csv
> >>>>> --document-name=groups somewhere/global-groups.csv
> >>>>>
> >>>>> The template must to be written with this use case in mind, as now it
> >>>>>
> >>>>> has
> >>>>>
> >>>>> #list some of the documents. (I think in practice you hardly ever
> >>>>> want
> >>>>>
> >>>>> to
> >>>>>
> >>>>> get a document by hard coded index. Either you don't know how many
> >>>>> documents you have, so you can't use hard coded indexes, or you do,
> >>>>> and
> >>>>> each index has a specific meaning, but then you should name the
> >>>>>
> >>>>> documents
> >>>>>
> >>>>> instead, as using indexes is error prone, and hard to read.)
> >>>>> Accessing that list of documents in the template, maybe could be done
> >>>>>
> >>>>> like
> >>>>>
> >>>>> this:
> >>>>> - For the "main" documents: `DocumentList`
> >>>>> - For explicitly named documents, like "users":
> >>>>>
> >>>>> `NamedDocumentLists.users`
> >>>>>
> >>>>> SUMMING UP
> >>>>>
> >>>>> To unify all 3 use cases into a coherent concept:
> >>>>> - `NamedDocumentLists.<name>` is the most generic form, and while you
> >>>>>
> >>>>> can
> >>>>>
> >>>>> achieve everything with it, using it requires your template to handle
> >>>>>
> >>>>> the
> >>>>>
> >>>>> most generic case too. So, I think it would be rarely used.
> >>>>> - `DocumentList` is just a shorthand for `NamedDocumentLists.main`.
> >>>>>
> >>>>> It's
> >>>>>
> >>>>> used if you only have one kind of documents (single format and
> >>>>> schema),
> >>>>>
> >>>>> but
> >>>>>
> >>>>> potentially multiple of them.
> >>>>> - `NamedDocuments.<name>` expresses that you expect exactly 1
> >>>>> document
> >>>>>
> >>>>> of
> >>>>>
> >>>>> the given name.
> >>>>> - `Document` is just a shorthand for `NamedDocuments.main`. This is
> >>>>> for
> >>>>>
> >>>>> the
> >>>>>
> >>>>> most natural/frequent use case.
> >>>>>
> >>>>> That's 4 possible ways of accessing your documents, which is a
> >>>>>
> >>>>> trade-off
> >>>>>
> >>>>> for the sake of these:
> >>>>> - Catching CLI (or Maven, etc.) input where the template output
> >>>>> likely
> >>>>>
> >>>>> will
> >>>>>
> >>>>> be wrong. That's only possible if the user can communicate its intent
> >>>>>
> >>>>> in
> >>>>>
> >>>>> the template.
> >>>>> - Users don't need to deal with concepts that are irrelevant in their
> >>>>> concrete use case. Just start with the trivial, `Document`, and later
> >>>>>
> >>>>> if
> >>>>>
> >>>>> the need arises, generalize to named documents, document lists, or
> >>>>>
> >>>>> both.
> >>>>>
> >>>>> What do guys think?
> >>>>>
> >>>>>
> >>>
> >>
> >>
> >> --
> >> Best regards,
> >> Daniel Dekany
> >>
> >
> >
> > --
> > Best regards,
> > Daniel Dekany
>
>

-- 
Best regards,
Daniel Dekany

Re: freemarker-generator: Improving the input documents concept

Posted by Daniel Dekany <da...@gmail.com>.

I was thinking what should we call these files, which cause the generation
of an output file (a "transformation" as you called it int that gist),
taking the CLI into account as well. Maybe, "seeds", as those are the
things output generation starts out from. That is, for a
freemarker-generator run, you may want to specify:

   - *Seed templates*: That's what you have called "aggregation" in the
   gist. So, one output file per template. Normally you also want to specify a
   data source for this, but that's not a "seed data source" (see next point).
   It's not strictly necessary though, as the template can just run in itself.
   - *Seed data sources*: This is when you generate one output per data
   source file. You must also specify a *transform template* for this, and
   note that that's not called a "seed template".
   - Seed data source entries: When you generate one output file per entry
   (like per table row) inside the data source. (We had to introduce data
   loaders for this to work, but I try to focus on one thing per mail, and for
   now I assume that we just load and parse data sources inside the template,
   like it's done currently. But I also wanted to show that there are more
   than 2 kind of seeds.)

So in a CLI call, that might looks like this:

One output per template (this is a single command, I juts use line break
for readability):

freemarker-generator
    --seed-template *.ftl
    --data-source main.csv
    --shared-data-source:helper helper.csv

I imagine it so that --seed-... starts a section, so that everything up
until to the next --seed-... or --shared-..., configures to that group of
seeds:


freemarker-generator
    --seed-template foo/*.ftl
    --data-source foo/main.csv

    --seed-template bar/*.ftl
    --data-source bar/main.csv

    --shared-data-source:helper common/helper.csv

One output per data file:

freemarker-generator
    --seed-data-source logs/*.csv
    --transform-template report.ftl
    --shared-data-source:helper bar/helper.csv


Based on above, naturally you can also mix one-output-per-template, and
one-output-per-data-source (again, this is a single command line command,
line breaks are only for readability):

freemarker-generator
    --seed-template foo/*.ftl
    --data-source foo/main.csv

    --seed-template bar/*.ftl
    --data-source bar/main.csv

    --seed-data-source logs/*.csv
    --transform-template report.ftl

    --shared-data-source:helper common/helper.csv


So there you generate output based on each ftl file in foo/, then you also
generate output based on each ftl file in bar/, and then you generate
output based on each csv files in logs/. All these will use
commons/helper.csv, which therefore has to be loaded only once. (That again
will be more helpful with data loaders, but that's for another time.)

Of course, this can be extended with specifying output file name per
--seed-..., and so on.

For the most common uses we could allow a sort hands in CLI, by assuming an
implicit --seed-template before the first argument. So if you want just try
something in FreeMarker, you can write this (as --data-source is optional
for --seed-template):

freemarker-generator adhoc.ftl

Or if you just need to transform a single data file (note that it doesn't
really matter then if it's the template that's the "seed", or the data
source):

freemarker-generator adhoc.ftl --data-source some.csv


This is just my idea anyway. What do you think?


On Fri, Mar 6, 2020 at 12:11 AM Daniel Dekany <da...@gmail.com>
wrote:

> About the world choice of "aggregation" and "generation" on
> https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449. We
> are "generating" files even if we are "aggregating". The project
> itself is called freemarker-generator for that reason actually. (Also
> "aggregation" is not really aggregating in the typical IT sense, I
> think.) I think to present the concept in an easy to understand way,
> we should express that you will have either one "transformation" per
> template file (which you called "aggregation"), or one "tranformation"
> per data-source (which you called "generation").
>
> But the concept is even more important. Again, at very least because
> of shared data (see mail before the previous), you will need to be
> able to use both "aggreagation" and "generation" in the same
> freemarker-cli run. Also there's the use case when you run one
> "transformation" per row in a table. So, *for each* something you run
> a "transformation", where that "something" can be a template, or a
> data file, or an entry (a row) in a data file.
>
> The tricky question is how to put all that in a sane way into CLI... I
> will think about that. But we really shouldn't cripple the core
> concept just because it's hard to express in a primitive interface
> like CLI. Anyway, pretty certainly only the simplest use-cases will
> look good on CLI, and for others, you will want to use a configuration
> file, or Maven, or Gradle.
>
> BTW, the "freemarker-cli" command should be really called
> "freemarker-generator", which is the project name. The Maven task
> isn't called "freemarker-maven" either. :) You are calling
> freemarker-generator via CLI, or via Maven.
>
> Also, you mentioned in last mail this idea of adding query parameters
> to configure that data source. That I think will be difficult to do in
> practice. Both the front-ends (CLI and Maven) has the concept of
> collecting file paths for you. But they won't attach query parameters
> to them for you. Also, data source configs can be long winded and
> having nested structures, so you would often end up with ?setup=<some
> JSON here>, which is quite horrible to URL encode and all.
>


-- 
Best regards,
Daniel Dekany

Re: freemarker-generator: Improving the input documents concept

Posted by Daniel Dekany <da...@gmail.com>.

About the world choice of "aggregation" and "generation" on
https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449. We
are "generating" files even if we are "aggregating". The project
itself is called freemarker-generator for that reason actually. (Also
"aggregation" is not really aggregating in the typical IT sense, I
think.) I think to present the concept in an easy to understand way,
we should express that you will have either one "transformation" per
template file (which you called "aggregation"), or one "tranformation"
per data-source (which you called "generation").

But the concept is even more important. Again, at very least because
of shared data (see mail before the previous), you will need to be
able to use both "aggreagation" and "generation" in the same
freemarker-cli run. Also there's the use case when you run one
"transformation" per row in a table. So, *for each* something you run
a "transformation", where that "something" can be a template, or a
data file, or an entry (a row) in a data file.

The tricky question is how to put all that in a sane way into CLI... I
will think about that. But we really shouldn't cripple the core
concept just because it's hard to express in a primitive interface
like CLI. Anyway, pretty certainly only the simplest use-cases will
look good on CLI, and for others, you will want to use a configuration
file, or Maven, or Gradle.

BTW, the "freemarker-cli" command should be really called
"freemarker-generator", which is the project name. The Maven task
isn't called "freemarker-maven" either. :) You are calling
freemarker-generator via CLI, or via Maven.

Also, you mentioned in last mail this idea of adding query parameters
to configure that data source. That I think will be difficult to do in
practice. Both the front-ends (CLI and Maven) has the concept of
collecting file paths for you. But they won't attach query parameters
to them for you. Also, data source configs can be long winded and
having nested structures, so you would often end up with ?setup=<some
JSON here>, which is quite horrible to URL encode and all.

Re: freemarker-generator: Improving the input documents concept

Posted by Siegfried Goeschl <si...@gmail.com>.

Hi Daniel,

Please see my comment below

Thanks in advance, 

Siegfried Goeschl


> On 05.03.2020, at 22:36, Daniel Dekany <da...@gmail.com> wrote:
> 
>> 
>> Regarding the "global mode" and "output generators files" - I'm sorry, but
>> I'm not getting it
> 
> 
> I'm not getting what doesn't go though. Can you explain?  The CLI suggested
> that you got "global mode" (a single --mode switch per run).

[SG] I think the confusion stems from different levels of abstractions (see next chapter) - while I try to get the command line invocation right you seem to think along a more technical implementation level.

Please have a look at https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449 <https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449> - I think such a "mode" might be needed but is not strictly relevant in the beginning. But it is an important concept fro the implementation ...

> 
> Do you think of defining explicit "output generator file" containing
>> `datasources, `templates` and `outputs` - yes that could be done but
>> does not feel like an interactive command line tool any longer
> 
> 
> I think what the CLI exposes and how should be a secondary detail at this
> phase, as the CLI is (or should be) just a front end, that wraps the common
> core (genertor.base). The CLI, the Maven task, Gradle task, etc. should
> probably just be thin wrappers around the common core. Do we agree on that?
> So, these concepts are "core" concepts, and probably govern the API of
> generator.base. That's was my intent here, to hammer out these core
> concepts.
> 
> Also the "output generator file" is usually just a data file, or just a
> template. It's just the file that causes some output generated. So,usually
> it doesn't *explicitly* contain all that information (though you might as
> well introduce a file type that does). But it still defines an output
> generator, because, you will have a template, a data-model, and an output
> file name.

[SG] If you think about the internal representation I fully agree with - I personally see something like a list of "Transformation" executed which contains the template, datasources and output 

> 
> I think you are leaning towards a 1.0 release why I favour 0.x.y to
>> have room to make mistakes / experiments
> 
> 
> The version number doesn't tell much to me, so what's your intent/strategy
> with these 0.x.y releases you plant to do? Like, if you release 0.1.0, then
> will you feel inconvenient to change things *radically* after that? That
> can be a problem, if the goal is iterating without bounds. On the other
> hand, if you don't feel inconvenient about that at all, I don't really see
> why a user would use it. But, if it's clearly indicated that everything can
> change, and you think it's useful to release that way, I don't want to be
> in your way.ng way 

[SG] What represents backward compatibility of CLI or Maven plugin? What can change?

* I don't want to change the command line parameters (CLI) and generator file layout (Maven) in a breaking way
* I want to avoid releasing things like "name:group" versus "group:name" when we have not settled on a decision
* What I still want to do in the near future is to change the public implementation classes since I do not assume that someone is using them for the time being

> 
> perfect is the enemy of good
> 
> 
> I just think the overall concept/architecture should be iterated out first.
> Polish, and adding all kind of bells, even fixing bugs, is different matter.

[SG] +1

> 
> On Thu, Mar 5, 2020 at 9:36 PM Siegfried Goeschl <
> siegfried.goeschl@gmail.com> wrote:
> 
>> Hi Daniel,
>> 
>> The introduction of named `Datasource` allows to simplify / streamline a
>> few things
>> 
>> * I have a meaningful user-supplied name
>> * I can pass additional configuration information as already implemented
>> with `charset` and `contenttype` and this would also allow configure a
>> `CSV Datasource`, e.g.
>> `users=./data/users.csv#format=default&header=true&delimeter=TAB` which
>> can be readily parses
>> * Currently the name of datasources are are taken from their relative
>> file name - might make sense to drop that but I need to contemplate :-)
>> 
>> Regarding the "global mode" and "output generators files" - I'm sorry,
>> but I'm not getting it
>> 
>> * I refined the
>> https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449 to
>> make my points more clearly
>> * Do you think of defining explicit "output generator file" containing
>> `datasources, `templates` and `outputs` - yes that could be done but
>> does not feel like an interactive command line tool any longer
>> 
>> 
>> Regarding "more idiomatic FTL usage"
>> 
>> * Yes, I need to dive into custom template models or whatever it is
>> called :-)
>> 
>> 
>> Something we need to iron out is a release policy
>> 
>> * Currently we have little agreement how the CLI should look like or
>> behave
>> * I think you are leaning towards a 1.0 release why I favour 0.x.y to
>> have room to make mistakes / experiments
>> * I personally see the possibility that we don't get a release out -
>> "perfect is the enemy of good"
>> 
>> How would you like to handle the problem - can we agree on minimal
>> feature set worthy a release?
>> 
>> Thanks in advance,
>> 
>> Siegfried Goeschl
>> 
>> 
>> On 1 Mar 2020, at 11:33, Daniel Dekany wrote:
>> 
>>>> 
>>>> Actually not recommended but we have named data sources for less than
>>>> 24
>>>> hours
>>> 
>>> 
>>> Sorry, not sure what that means. Anyway, my "vote" is let's not give
>>> automatic names if that's not recommended to utilize. I mean, in case
>>> we
>>> happen to agree on that, why leave it there. Especially if
>>> automatically
>>> chosen names can clash with explicitly given ones, that would be a
>>> trouble.  (I'm not sure right now if they can... the path we use as
>>> the
>>> name can be realtive? Then it realistically can.)
>>> 
>>> This is a command line tool where we have little idea what the user
>>> will do
>>>> or abuse
>>> 
>>> 
>>> No matter how much/little we know, we firmly put our bets by releasing
>>> something. So if some feature is certainly not right, that's enough to
>>> not
>>> have it, I think.
>>> 
>>> How does a "data loader" knows that it is responsible to load a file
>>> 
>>> What should as "CSV data loader" should do - parse it into a list of
>>>> records or stream one by one?
>>> 
>>> 
>>> I think I was misunderstood here. It's not about some kind of
>>> auto-magic.
>>> It's about where do you specify what to load and how, and in what
>>> format do
>>> you specify that. Of course, you must specify the data source
>>> (basically an
>>> URI for now as I saw), the rough format (CSV), and the format options
>>> (separator character, etc.), and other freemarker-generator loading
>>> options
>>> (like which CSV columns are numbers, which are dates, with what
>>> format,
>>> what counts as null, etc.).
>>> 
>>> What was confusing in what I said much earlier is probably that you
>>> don't
>>> need a global "--mode". That just means that you can have multiple
>>> "modes"
>>> in the same run, not that you need some big auto-magic. And that they
>>> aren't really "modes" then... I think it's just natural that you can
>>> have
>>> different kind of "output generator" files in the same run. Why force
>>> the
>>> assumption that you don't, especially considering that they will might
>>> want
>>> to access common data (which you don't want to load again and again,
>>> for
>>> each run of the different --mode-s you need). Of course, as you might
>>> select files with wildcards (or by specifying a whole directory, or
>>> with
>>> some Maven matcher), you just can't directly associate the data loader
>>> options to the individual data sources. Instead you can say elsewhere
>>> that
>>> *.csv inside this explicit "group", or with this file name pattern, is
>>> to
>>> be loaded like this. That's what you might perceived as auto-magic.
>>> It's
>>> just mass-producing data loaders for "cattle" files.
>>> 
>>> How to handle the case if you have multiple potential data loaders for
>>> a
>>>> single file?
>>> 
>>> 
>>> As per above, that's just two data loaders referring to the same data
>>> source, so, nothing special.
>>> 
>>> As of the current state of things, this is how I'm supposed to load a
>>> CSV,
>>> in the template itself (if I'm not outdated/mistaken):
>>> 
>>> <#assign cvsFormat = CSVTool.formats.DEFAULT.withHeader()>
>>> <#assign foos = CSVTool.parse(Datasources.get("foos"),
>>> cvsFormat).records>
>>> <#assign bars = CSVTool.parse(Datasources.get("barb"),
>>> cvsFormat).records>
>>> 
>>> It will worth exploring how to make these look more "idiomatic" FTL
>>> (given
>>> this is an "official" FM product now, I think, we should show how it's
>>> done), and nicer in general. Point for now is, that's basically two
>>> data-loaders interwoven with the template there. Because they are
>>> interwoven like that, you can't reuse what they loaded for another
>>> template
>>> execution.
>>> 
>>> That's comes down to personal preferences, e.g. chown uses
>>> "owner[:group] "
>>> 
>>> 
>>> Yeah, but XML namespaces, Java, C, etc. all use
>>> <parent><operator><child>,
>>> so, I think, that clicks for more of our potential users. So let's bet
>>> on
>>> what clicks for more users.
>>> 
>>> Besides, I challenged the very idea that we need both groups and
>>> names. :)
>>> Saying that it's simpler and less opinioned (more flexible) to have
>>> just
>>> multiple names (like tags). What's the end of that?
>>> 
>>> On Sun, Mar 1, 2020 at 9:47 AM Siegfried Goeschl <
>>> siegfried.goeschl@gmail.com> wrote:
>>> 
>>>> HI Daniel,
>>>> 
>>>> Please see my comments below
>>>> 
>>>> Thanks in advance,
>>>> 
>>>> Siegfried Goeschl
>>>> 
>>>> 
>>>>> On 29.02.2020, at 21:02, Daniel Dekany <da...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> 
>>>>>> I try to provide a useful name even when the content is coming from
>>>>>> an
>>>>>> URL
>>>>> 
>>>>> 
>>>>> When is it recommended to rely on that though? Because utilizing
>>>>> that
>>>> means
>>>>> that renaming a data source file can break the process, even if you
>>>>> call
>>>>> freemarker-cli with the up to date file name. And if that happens
>>>>> depends
>>>>> on what you (or an other random colleague!) have dug inside the
>>>> templates.
>>>>> So I guess we better just don't support this. Less code and less
>>>>> things
>>>> to
>>>>> document too.
>>>>> 
>>>> 
>>>> Actually not recommended but we have named data sources for less than
>>>> 24
>>>> hours
>>>> 
>>>>> 
>>>>>> I think we have a different understanding what a "Document" /
>>>> "Datasource
>>>>>> / DataSource" should do
>>>>> 
>>>>> 
>>>>> Thing is, eventually (most certainly pre-1.0, as it influences
>>>>> architecture), certain needs will have to addressed, somehow. Then
>>>>> we
>>>> will
>>>>> see what "things" we really need. For now I though we need "things"
>>>>> that
>>>>> are much more than paths, and encapsulate the "how to load the data"
>>>>> aspect. I called them data sources, but maybe we should called them
>>>>> "data
>>>>> loaders" to free up data sources for the more primitive thing. Some
>>>>> needs/doubts to address, *later*: Is it really the best approach for
>>>> users
>>>>> to load/parse data sources programmatically (that coded is written
>>>>> in
>>>> FTL,
>>>>> inside the templates)? Also, is the template the right place for
>>>>> doing
>>>>> that, because, when multiple templates (or just multiple template
>>>>> *runs*
>>>> of
>>>>> the same template, each generating a different output file) needs
>>>>> common
>>>>> data, they shouldn't load it again and again. Also, different topic,
>>>>> can
>>>> we
>>>>> handle the case "transparently" enough when the data is not coming
>>>>> from a
>>>>> file?
>>>> 
>>>> This is a command line tool where we have little idea what the user
>>>> will
>>>> do or abuse
>>>> 
>>>> * How does a "data loader" knows that it is responsible to load a
>>>> file
>>>> * What should as "CSV data loader" should do - parse it into a list
>>>> of
>>>> records or stream one by one?
>>>> * How to handle the case if you have multiple potential data loaders
>>>> for a
>>>> single file?
>>>> 
>>>> I'm leaning towards building blocks where the user controls the work
>>>> to be
>>>> done even it requires one to two extra lines of FTL code
>>>> 
>>>> 
>>>>> 
>>>>> The joy of programming - I did not intend to use "name:group"
>>>>> together
>>>> with
>>>>>> wildcards :-)
>>>>> 
>>>>> 
>>>>> For a CLI tool, I guess we agree that it should work. So maybe, like
>>>>> this
>>>>> (here logs and foos meant to be "groups"):
>>>>> --data-source logs file1.log file2.log fileN.log   --data-source
>>>>> foos
>>>>> foo1.csv foo2.csv fooN.csv  --data-source bar bar.xlsx
>>>>> 
>>>>> It so happens that here you don't really have a good control about
>>>>> the
>>>>> number of files associated to the name, so, maybe yet another reason
>>>>> to
>>>> not
>>>>> differentiate names and groups.
>>>>> 
>>>>> I Disagree here - I think using a name would be used more often. I
>>>>> added
>>>>>> the "group" as an afterthought since some grouping could be useful
>>>>> 
>>>>> 
>>>>> We do agree in that. What I said is that the *syntax* should be so
>>>>> that
>>>> the
>>>>> group comes first. It's still optional. Like this:
>>>>> --data-source group:name /somewhere
>>>>> --data-source name /somewhere
>>>> 
>>>> That's comes down to personal preferences, e.g. chown uses
>>>> "owner[:group] "
>>>> 
>>>>> 
>>>>> On Sat, Feb 29, 2020 at 7:34 PM Siegfried Goeschl <
>>>>> siegfried.goeschl@gmail.com> wrote:
>>>>> 
>>>>>> HI Daniel,
>>>>>> 
>>>>>> Seem my comments below
>>>>>> 
>>>>>> Thanks in advance,
>>>>>> 
>>>>>> Siegfried Goeschl
>>>>>> 
>>>>>> 
>>>>>>> On 29.02.2020, at 19:08, Daniel Dekany <da...@gmail.com>
>>>> wrote:
>>>>>>> 
>>>>>>> FREEMARKER-135 freemarker-generator-cli: Support user-supplied
>>>>>>> names
>>>> for
>>>>>>> datasources
>>>>>>> 
>>>>>>> So, I can do this to have both a name an a group associated to a
>>>>>>> data
>>>>>>> source:
>>>>>>> --datasource someName:someGroup=somewhere/something
>>>>>> 
>>>>>> Correct
>>>>>> 
>>>>>>> Or if I only want a name, but not a group (or an ""  group
>>>>>>> actually -
>>>>>>> bug?), then:
>>>>>>> --datasource someName=somewhere/something
>>>>>> 
>>>>>> Correct
>>>>>> 
>>>>>>> 
>>>>>>> Or if only a group but not a name (or a "" name actually) then:
>>>>>>> --datasource :someGroup=somewhere/something
>>>>>> 
>>>>>> Mhmm, that would be unintended functionality from my side - current
>>>>>> approach is that every "Document" / "Datasource / DataSource" is
>>>>>> named
>>>>>> 
>>>>>>> 
>>>>>>> A name must identify exactly 1 data source, while a group
>>>>>>> identifies a
>>>>>> list
>>>>>>> of data sources.
>>>>>> 
>>>>>> No, every "Document" / "Datasource / DataSource" has a name
>>>>>> currently
>>>> but
>>>>>> uniqueness is not enforced. Only if you want to get a "Document" /
>>>>>> "Datasource / DataSource" with it's exact name I checked for
>>>>>> exactly one
>>>>>> search hit and throw an exception. I try to provide a useful name
>>>>>> even
>>>> when
>>>>>> the content is coming from an URL or STDIN (and I will probably add
>>>>>> environment variables as "Document" / "Datasource / DataSource",
>>>>>> e.g
>>>>>> configuration in the cloud as JSON content passed as environment
>>>> variable)
>>>>>> 
>>>>>>> 
>>>>>>> Is that this idea, that the a data source can be part of a group,
>>>>>>> and
>>>>>> then
>>>>>>> is also possibly identifiable with a name comes from an use case?
>>>>>>> I
>>>> mean,
>>>>>>> it's possibly important somewhere, but if so, then it's strange
>>>>>>> that
>>>> you
>>>>>>> can put something into only a single group. If we need this kind
>>>>>>> of
>>>>>> thing,
>>>>>>> then perhaps you should be just allowed to associate the data
>>>>>>> source
>>>>>> with a
>>>>>>> list of names (kind of like tagging), and then when the template
>>>>>>> wants
>>>> to
>>>>>>> get something by name, it will tell there if it expects exactly
>>>>>>> one or
>>>> a
>>>>>>> list of data sources. Then you don't need to introduce two terms
>>>>>>> in the
>>>>>>> documentation either (names and groups). Again, if we want this at
>>>>>>> all,
>>>>>>> instead of just going with a data source that itself gives a list.
>>>>>>> (And
>>>>>> if
>>>>>>> not, how will we handle a data source that loads from a non-file
>>>> source?)
>>>>>> 
>>>>>> I actually thought of implementing tagging but considered a "group"
>>>>>> sufficient.
>>>>>> 
>>>>>> * If you don't define anything everything goes into the "default"
>>>>>> group
>>>>>> * For individual documents you can define a name and an optional
>>>>>> group
>>>>>> 
>>>>>> I think we have a different understanding what a "Document" /
>>>> "Datasource
>>>>>> / DataSource" should do
>>>>>> 
>>>>>> * It is a dumb
>>>>>> * It is lazy since data is only loaded on demand
>>>>>> * There is no automagic like "oh, this is a JSON file, so let's go
>>>>>> to
>>>> the
>>>>>> JSON tool and create a map readily accessible in the data model"
>>>>>> 
>>>>>>> 
>>>>>>> Note that the current command line syntax doesn't work well with
>>>>>>> shell
>>>>>>> wildcard expansion. Like this:
>>>>>>> --datasource :someGroup=logs/*.log
>>>>>>> will try to expand ":someGroup=logs/*.log", and because it finds
>>>> nothing
>>>>>>> (and because the rules of sh and the like is a mess), you will get
>>>>>>> the
>>>>>>> parameter value as is, without * expanded.
>>>>>> 
>>>>>> The joy of programming - I did not intend to use "name:group"
>>>>>> together
>>>>>> with wildcards :-)
>>>>>> 
>>>>>>> 
>>>>>>> Also,  I think the syntax with colon should be flipped, because on
>>>> other
>>>>>>> places foo:bar usually means that foo is the bigger unit (the
>>>> container),
>>>>>>> and bar is the smaller unit (the child).
>>>>>> 
>>>>>> I Disagree here - I think using a name would be used more often. I
>>>>>> added
>>>>>> the "group" as an afterthought since some grouping could be useful
>>>>>> 
>>>>>>> 
>>>>>>> On Sat, Feb 29, 2020 at 5:03 PM Siegfried Goeschl <
>>>>>>> siegfried.goeschl@gmail.com> wrote:
>>>>>>> 
>>>>>>>> Hi Daniel,
>>>>>>>> 
>>>>>>>> I'm an enterprise developer - bad habits die hard :-)
>>>>>>>> 
>>>>>>>> So I closed the following tickets and merged the branches
>>>>>>>> 
>>>>>>>> 1) FREEMARKER-129 freemarker-generator: Merge "freemarker-cli"
>>>>>>>> into
>>>>>>>> "freemarker-generator"
>>>>>>>> 2) FREEMARKER-134 freemarker-generator: Rename "Document" to
>>>>>> "Datasource"
>>>>>>>> 3) FREEMARKER-135 freemarker-generator-cli: Support user-supplied
>>>> names
>>>>>>>> for datasources
>>>>>>>> 
>>>>>>>> Thanks in advance,
>>>>>>>> 
>>>>>>>> Siegfried Goeschl
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 29.02.2020, at 12:19, Daniel Dekany <da...@gmail.com>
>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Yeah, and of course, you can merge that branch. You can even
>>>>>>>>> work on
>>>>>> the
>>>>>>>>> master directly after all.
>>>>>>>>> 
>>>>>>>>> On Sat, Feb 29, 2020 at 12:17 PM Daniel Dekany <
>>>>>> daniel.dekany@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> But, I do recognize the cattle use case (several "faceless"
>>>>>>>>>> files
>>>> with
>>>>>>>>>> common format/schema). Only, my idea is to push that complexity
>>>>>>>>>> on
>>>> the
>>>>>>>> data
>>>>>>>>>> source. The "data source" concept shields the rest of the
>>>> application
>>>>>>>> from
>>>>>>>>>> the details of how the data is stored or retrieved. So, a data
>>>> source
>>>>>>>> might
>>>>>>>>>> loads a bunch of log files from a directory, and present them
>>>>>>>>>> as a
>>>>>>>> single
>>>>>>>>>> big table, or like a list of tables, etc. So I want to deal
>>>>>>>>>> with the
>>>>>>>> cattle
>>>>>>>>>> use case, but the question is what part of the of architecture
>>>>>>>>>> will
>>>>>> deal
>>>>>>>>>> with this complication, with other words, how do you box
>>>>>>>>>> things. Why
>>>>>> my
>>>>>>>>>> initial bet is to stuff that complication into the "data
>>>>>>>>>> source"
>>>>>>>>>> implementation(s) is that data sources are inherently varied.
>>>>>>>>>> Some
>>>>>>>> returns
>>>>>>>>>> a table-like thing, some have multiple named tables (worksheets
>>>>>>>>>> in
>>>>>>>> Excel),
>>>>>>>>>> some returns tree of nodes (XML), etc. So then, some might
>>>>>>>>>> returns a
>>>>>>>>>> list-of-list-of log records, or just a single list of
>>>>>>>>>> log-records
>>>> (put
>>>>>>>>>> together from daily log files). That way cattles don't add to
>>>>>> conceptual
>>>>>>>>>> complexity. Now, you might be aware of cases where the cattle
>>>> concept
>>>>>>>> must
>>>>>>>>>> be more exposed than this, and the we can't box things like
>>>>>>>>>> this.
>>>> But
>>>>>>>> this
>>>>>>>>>> is what I tried to express.
>>>>>>>>>> 
>>>>>>>>>> Regarding "output generators", and how that applies on the
>>>>>>>>>> command
>>>>>>>> line. I
>>>>>>>>>> think it's important that the common core between Maven and
>>>>>>>> command-line is
>>>>>>>>>> as fat as possible. Ideally, they are just two syntax to set up
>>>>>>>>>> the
>>>>>> same
>>>>>>>>>> thing. Mostly at least. So, if you specify a template file to
>>>>>>>>>> the
>>>> CLI
>>>>>>>>>> application, in a way so that it causes it to process that
>>>>>>>>>> template
>>>> to
>>>>>>>>>> generate a single output, then there you have just defined an
>>>> "output
>>>>>>>>>> generator" (even if it wasn't explicitly called like that in
>>>>>>>>>> the
>>>>>> command
>>>>>>>>>> line). If you specify 3 csv files to the CLI application, in a
>>>>>>>>>> way
>>>> so
>>>>>>>> that
>>>>>>>>>> it causes it to generate 3 output files, then you have just
>>>>>>>>>> defined
>>>> 3
>>>>>>>>>> "output generators" there (there's at least one template
>>>>>>>>>> specified
>>>>>> there
>>>>>>>>>> too, but that wasn't an "output generator" itself, it was just
>>>>>>>>>> an
>>>>>>>> attribute
>>>>>>>>>> of the 3 output generators). If you specify 1 template, and 3
>>>>>>>>>> csv
>>>>>>>> files, in
>>>>>>>>>> a way so that it will yield 4 output files (1 for the template,
>>>>>>>>>> 3
>>>> for
>>>>>>>> the
>>>>>>>>>> csv-s), then you have defined 4 output generators there. If you
>>>> have a
>>>>>>>> data
>>>>>>>>>> source that loads a list of 3 entities (say, 3 csv files, so
>>>>>>>>>> it's a
>>>>>>>> list of
>>>>>>>>>> tables then), and you have 2 templates, and you tell the CLI to
>>>>>> execute
>>>>>>>>>> each template for each item in said data source, then you have
>>>>>>>>>> just
>>>>>>>> defined
>>>>>>>>>> 6 "output generators".
>>>>>>>>>> 
>>>>>>>>>> On Fri, Feb 28, 2020 at 11:08 AM Siegfried Goeschl <
>>>>>>>>>> siegfried.goeschl@gmail.com> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>> 
>>>>>>>>>>> That all depends on your mental model and work you do,
>>>> expectations,
>>>>>>>>>>> experience :-)
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> __Document Handling__
>>>>>>>>>>> 
>>>>>>>>>>> *"But I think actually we have no good use case for list of
>>>> documents
>>>>>>>>>>> that's passed at once to a single template run, so, we can
>>>>>>>>>>> just
>>>>>> ignore
>>>>>>>>>>> that complication"*
>>>>>>>>>>> 
>>>>>>>>>>> In my case that's not a complication but my daily business -
>>>>>>>>>>> I'm
>>>>>>>>>>> regularly wading through access logs - yesterday probably a
>>>>>>>>>>> couple
>>>> of
>>>>>>>>>>> hundreds access logs across two staging sites to help tracking
>>>>>>>>>>> some
>>>>>>>>>>> strange API gateway issues :-)
>>>>>>>>>>> 
>>>>>>>>>>> My gut feeling is (borrowing from
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> https://medium.com/@Joachim8675309/devops-concepts-pets-vs-cattle-2380b5aab313
>>>>>>>>>>> )
>>>>>>>>>>> 
>>>>>>>>>>> 1. You have a few lovely named documents / templates - `pets`
>>>>>>>>>>> 2. You have tons of anonymous documents / templates to process
>>>>>>>>>>> -
>>>>>>>>>>> `cattle`
>>>>>>>>>>> 3. The "grey area" comes into play when mixing `pets & cattle`
>>>>>>>>>>> 
>>>>>>>>>>> `freemarker-cli` was built with 2) in mind and I want to cover
>>>>>>>>>>> 1)
>>>>>> since
>>>>>>>>>>> it is equally important and common.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> __Template And Document Processing Modes__
>>>>>>>>>>> 
>>>>>>>>>>> IMHO it is important to answer the following question : "How
>>>>>>>>>>> many
>>>>>>>>>>> outputs do you get when rendering 2 template and 3
>>>>>>>>>>> datasources?
>>>> Two,
>>>>>>>>>>> Three or Six?"
>>>>>>>>>>> 
>>>>>>>>>>> Your answer is influenced by your mental model / experience
>>>>>>>>>>> 
>>>>>>>>>>> * When wading through tons of CSV files, access logs, etc. the
>>>> answer
>>>>>>>> is
>>>>>>>>>>> "2"
>>>>>>>>>>> * When doing source code generation the obvious answer is "6"
>>>>>>>>>>> * Can't image a use case which results in "3" but I'm pretty
>>>>>>>>>>> sure
>>>> we
>>>>>>>>>>> will encounter one
>>>>>>>>>>> 
>>>>>>>>>>> __Template and document mode probably shouldn't exist__
>>>>>>>>>>> 
>>>>>>>>>>> That's hard for me to fully understand - I definitely lack
>>>>>>>>>>> your
>>>>>>>> insights
>>>>>>>>>>> & experience writing such tools :-)
>>>>>>>>>>> 
>>>>>>>>>>> Defining the `Output Generator` is the underlying model for
>>>>>>>>>>> the
>>>> Maven
>>>>>>>>>>> plugin (and probably FMPP).
>>>>>>>>>>> 
>>>>>>>>>>> I'm not sure if this applies for command lines at least not in
>>>>>>>>>>> the
>>>>>> way
>>>>>>>> I
>>>>>>>>>>> use them (or would like to use them)
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>> 
>>>>>>>>>>> Siegfried Goeschl
>>>>>>>>>>> 
>>>>>>>>>>> PS: Can/shall I merge the PR to bring in `freemarker-cli`?
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On 28 Feb 2020, at 9:14, Daniel Dekany wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Yeah, "data source" is surely a too popular name, but for
>>>>>>>>>>>> reason.
>>>>>>>>>>>> Anyone
>>>>>>>>>>>> has other ideas?
>>>>>>>>>>>> 
>>>>>>>>>>>> As of naming data sources and such. One thing I was wondering
>>>> about
>>>>>>>>>>>> back
>>>>>>>>>>>> then is how to deal with list of documents given to a
>>>>>>>>>>>> template,
>>>>>> versus
>>>>>>>>>>>> exactly 1 document given to a template. But I think actually
>>>>>>>>>>>> we
>>>> have
>>>>>>>>>>>> no
>>>>>>>>>>>> good use case for list of documents that's passed at once to
>>>>>>>>>>>> a
>>>>>> single
>>>>>>>>>>>> template run, so, we can just ignore that complication. A
>>>>>>>>>>>> document
>>>>>> has
>>>>>>>>>>>> a
>>>>>>>>>>>> name, and that's always just a single document, not a
>>>>>>>>>>>> collection,
>>>> as
>>>>>>>>>>>> far as
>>>>>>>>>>>> the template is concerned. (We can have multiple documents
>>>>>>>>>>>> per
>>>> run,
>>>>>>>>>>>> but
>>>>>>>>>>>> those normally yield separate output generators, so it's
>>>>>>>>>>>> still
>>>> only
>>>>>>>>>>>> one
>>>>>>>>>>>> document per template.) However, we can have data source
>>>>>>>>>>>> types
>>>>>>>>>>>> (document
>>>>>>>>>>>> types with old terminology) that collect together multiple
>>>>>>>>>>>> data
>>>>>> files.
>>>>>>>>>>>> So
>>>>>>>>>>>> then that complexity is encapsulated into the data source
>>>>>>>>>>>> type,
>>>> and
>>>>>>>>>>>> doesn't
>>>>>>>>>>>> complicate the overall architecture. That's another case when
>>>>>>>>>>>> a
>>>> data
>>>>>>>>>>>> source
>>>>>>>>>>>> is not just a file. Like maybe there's a data source type
>>>>>>>>>>>> that
>>>> loads
>>>>>>>>>>>> all
>>>>>>>>>>>> the CSV-s from a directory, into a single big table (I had
>>>>>>>>>>>> such
>>>>>> case),
>>>>>>>>>>>> or
>>>>>>>>>>>> even into a list of tables. Or, as I mentioned already, a
>>>>>>>>>>>> data
>>>>>> source
>>>>>>>>>>>> is
>>>>>>>>>>>> maybe an SQL query on a JDBC data source (and we got the
>>>>>>>>>>>> first
>>>> term
>>>>>>>>>>>> clash... JDBC also call them data sources).
>>>>>>>>>>>> 
>>>>>>>>>>>> Template and document mode probably shouldn't exist from user
>>>>>>>>>>>> perspective
>>>>>>>>>>>> either, at least not as a global option that must apply to
>>>>>> everything
>>>>>>>>>>>> in a
>>>>>>>>>>>> run. They could just give the files that define the "output
>>>>>>>>>>>> generators",
>>>>>>>>>>>> and some of them will be templates, some of them are data
>>>>>>>>>>>> files,
>>>> in
>>>>>>>>>>>> which
>>>>>>>>>>>> case a template need to be associated with them (and there
>>>>>>>>>>>> can be
>>>> a
>>>>>>>>>>>> couple
>>>>>>>>>>>> of ways of doing that). And then again, there are the cases
>>>>>>>>>>>> where
>>>>>> you
>>>>>>>>>>>> want
>>>>>>>>>>>> to create one output generator per entity from some data
>>>>>>>>>>>> source.
>>>>>>>>>>>> 
>>>>>>>>>>>> On Fri, Feb 28, 2020 at 8:23 AM Siegfried Goeschl <
>>>>>>>>>>>> siegfried.goeschl@gmail.com> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> See my comments below - and thanks for your patience and
>>>>>>>>>>>>> input
>>>> :-)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> *Renaming Document To DataSource*
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Yes, makes sense. I tried to avoid since I'm using
>>>> javax.activation
>>>>>>>>>>>>> and
>>>>>>>>>>>>> its DataSource.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> *Template And Document Mode*
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Agreed - I think it is a valuable abstraction for the user
>>>>>>>>>>>>> but it
>>>>>> is
>>>>>>>>>>>>> not
>>>>>>>>>>>>> an implementation concept :-)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> *Document Without Symbolic Names*
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Also agreed and it is going to change but I have not settled
>>>>>>>>>>>>> my
>>>>>> mind
>>>>>>>>>>>>> yet
>>>>>>>>>>>>> what exactly to implement.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Siegfried Goeschl
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 28 Feb 2020, at 1:05, Daniel Dekany wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> A few quick thoughts on that:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> - We should replace the "document" term with something more
>>>>>> speaking.
>>>>>>>>>>>>> It
>>>>>>>>>>>>> doesn't tell that it's some kind of input. Also, most of
>>>>>>>>>>>>> these
>>>>>> inputs
>>>>>>>>>>>>> aren't something that people typically call documents. Like
>>>>>>>>>>>>> a csv
>>>>>>>>>>>>> file, or
>>>>>>>>>>>>> a database table, which is not even a file (OK we don't
>>>>>>>>>>>>> support
>>>>>> such
>>>>>>>>>>>>> thing
>>>>>>>>>>>>> at the moment). I think, maybe "data source" is a safe
>>>>>>>>>>>>> enough
>>>> term.
>>>>>>>>>>>>> (It
>>>>>>>>>>>>> also rhymes with data model.)
>>>>>>>>>>>>> - You have separate "template" and "document" "mode", that
>>>> applies
>>>>>> to
>>>>>>>>>>>>> a
>>>>>>>>>>>>> whole run. I think such specialization won't be helpful. We
>>>>>>>>>>>>> could
>>>>>>>>>>>>> just say,
>>>>>>>>>>>>> on the conceptual level at lest, that we need a set of
>>>>>>>>>>>>> "outputs
>>>>>>>>>>>>> generators". An output generator is an object (in the API)
>>>>>>>>>>>>> that
>>>>>>>>>>>>> specifies a
>>>>>>>>>>>>> template, a data-model (where the data-model is possibly
>>>> populated
>>>>>>>>>>>>> with
>>>>>>>>>>>>> "documents"), and an output "sink" (a file path, or stdout),
>>>>>>>>>>>>> and
>>>>>> can
>>>>>>>>>>>>> generate the output itself. A practical way of defining the
>>>> output
>>>>>>>>>>>>> generators in a CLI application is via a bunch of files,
>>>>>>>>>>>>> each
>>>>>>>>>>>>> defining an
>>>>>>>>>>>>> output generator. Some of those files is maybe a template
>>>>>>>>>>>>> (that
>>>> you
>>>>>>>>>>>>> can
>>>>>>>>>>>>> even detect from the file extension), or a data file that we
>>>>>>>>>>>>> currently call
>>>>>>>>>>>>> a "document". They could freely mix inside the same run. I
>>>>>>>>>>>>> have
>>>>>> also
>>>>>>>>>>>>> met
>>>>>>>>>>>>> use case when you have a single table (single "document"),
>>>>>>>>>>>>> and
>>>> each
>>>>>>>>>>>>> record
>>>>>>>>>>>>> in it yields an output file. That can also be described in
>>>>>>>>>>>>> some
>>>>>> file
>>>>>>>>>>>>> format, or really in any other way, like directly in command
>>>>>>>>>>>>> line
>>>>>>>>>>>>> argument,
>>>>>>>>>>>>> via API, etc.
>>>>>>>>>>>>> - You have multiple documents without associated symbolical
>>>>>>>>>>>>> name
>>>> in
>>>>>>>>>>>>> some
>>>>>>>>>>>>> examples. Templates can't identify those then in a well
>>>>>> maintainable
>>>>>>>>>>>>> way.
>>>>>>>>>>>>> The actual file name is often not a good identifier, can
>>>>>>>>>>>>> change
>>>>>> over
>>>>>>>>>>>>> time,
>>>>>>>>>>>>> and you might don't even have good control over it, like you
>>>>>> already
>>>>>>>>>>>>> receive it as a parameter from somewhere else, or someone
>>>>>>>>>>>>> moves/renames
>>>>>>>>>>>>> that files that you need to read. Index is also not very
>>>>>>>>>>>>> good,
>>>> but
>>>>>> I
>>>>>>>>>>>>> have
>>>>>>>>>>>>> written about that earlier.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, Feb 26, 2020 at 9:33 AM Siegfried Goeschl <
>>>>>>>>>>>>> siegfried.goeschl@gmail.com> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi folks,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> still wrapping my side around but assembled some thoughts
>>>>>>>>>>>>> here -
>>>>>>>>>>>>> 
>>>> https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Siegfried Goeschl
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 23 Feb 2020, at 23:14, Daniel Dekany <dd...@apache.org>
>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> What you are describing is more like the angle that FMPP
>>>>>>>>>>>>> took
>>>>>>>>>>>>> initially,
>>>>>>>>>>>>> where templates drive things, they generate the output for
>>>>>> themselves
>>>>>>>>>>>>> 
>>>>>>>>>>>>> (even
>>>>>>>>>>>>> 
>>>>>>>>>>>>> multiple output files if they wish). By default output files
>>>>>>>>>>>>> name
>>>>>>>>>>>>> (and
>>>>>>>>>>>>> relative path) is deduced from template name. There was also
>>>>>>>>>>>>> a
>>>>>> global
>>>>>>>>>>>>> data-model, built in a configuration file (or equally, built
>>>>>>>>>>>>> via
>>>>>>>>>>>>> command
>>>>>>>>>>>>> line arguments, or both mixed), from which templates get
>>>>>>>>>>>>> whatever
>>>>>>>>>>>>> data
>>>>>>>>>>>>> 
>>>>>>>>>>>>> they
>>>>>>>>>>>>> 
>>>>>>>>>>>>> are interested in. Take a look at the figures here:
>>>>>>>>>>>>> http://fmpp.sourceforge.net/qtour.html. Later, this concept
>>>>>>>>>>>>> was
>>>>>>>>>>>>> 
>>>>>>>>>>>>> generalized
>>>>>>>>>>>>> 
>>>>>>>>>>>>> a bit more, because you could add XML files at the same
>>>>>>>>>>>>> place
>>>> where
>>>>>>>>>>>>> you
>>>>>>>>>>>>> have the templates, and then you could associate transform
>>>>>> templates
>>>>>>>>>>>>> to
>>>>>>>>>>>>> 
>>>>>>>>>>>>> the
>>>>>>>>>>>>> 
>>>>>>>>>>>>> XML files (based on path pattern and/or the XML document
>>>> element).
>>>>>>>>>>>>> Now
>>>>>>>>>>>>> that's like what freemarker-generator had initially (data
>>>>>>>>>>>>> files
>>>>>> drive
>>>>>>>>>>>>> output, and the template is there to transform it).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> So I think the generic mental model would like this:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 1. You got files that drive the process, let's call them
>>>> *generator
>>>>>>>>>>>>> files* for now. Usually, each generator file yields an
>>>>>>>>>>>>> output
>>>> file
>>>>>>>>>>>>> (but
>>>>>>>>>>>>> maybe even multiple output files, as you might saw in the
>>>>>>>>>>>>> last
>>>>>>>>>>>>> figure).
>>>>>>>>>>>>> These generator files can be of many types, like XML, JSON,
>>>>>>>>>>>>> XLSX
>>>>>> (as
>>>>>>>>>>>>> 
>>>>>>>>>>>>> in the
>>>>>>>>>>>>> 
>>>>>>>>>>>>> original freemarker-generator), and even templates (as is
>>>>>>>>>>>>> the
>>>> norm
>>>>>> in
>>>>>>>>>>>>> FMPP). If the file is not a template, then you got a set of
>>>>>>>>>>>>> transformer
>>>>>>>>>>>>> templates (-t CLI option) in a separate directory, which can
>>>>>>>>>>>>> be
>>>>>>>>>>>>> 
>>>>>>>>>>>>> associated
>>>>>>>>>>>>> 
>>>>>>>>>>>>> with the generator files base on name patterns, and even
>>>>>>>>>>>>> based on
>>>>>>>>>>>>> 
>>>>>>>>>>>>> content
>>>>>>>>>>>>> 
>>>>>>>>>>>>> (schema usually). If the generator file is a template (so
>>>>>>>>>>>>> that's
>>>> a
>>>>>>>>>>>>> positional @Parameter CLI argument that happens to be an
>>>>>>>>>>>>> *.ftl,
>>>> and
>>>>>>>>>>>>> is
>>>>>>>>>>>>> 
>>>>>>>>>>>>> not
>>>>>>>>>>>>> 
>>>>>>>>>>>>> a template file specified after the "-t" option), then you
>>>>>>>>>>>>> just
>>>>>>>>>>>>> Template.process(...) it, and it prints what the output will
>>>>>>>>>>>>> be.
>>>>>>>>>>>>> 2. You also have a set of variables, the global data-model,
>>>>>>>>>>>>> that
>>>>>>>>>>>>> contains commonly useful stuff, like what you now call
>>>>>>>>>>>>> parameters
>>>>>>>>>>>>> (CLI
>>>>>>>>>>>>> -Pname=value), but also maybe data loaded from JSON, XML,
>>>>>>>>>>>>> etc..
>>>>>> Those
>>>>>>>>>>>>> 
>>>>>>>>>>>>> data
>>>>>>>>>>>>> 
>>>>>>>>>>>>> files aren't "generator files". Templates just use them if
>>>>>>>>>>>>> they
>>>>>> need
>>>>>>>>>>>>> 
>>>>>>>>>>>>> them.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> An important thing here is to reuse the same mechanism to
>>>>>>>>>>>>> read
>>>> and
>>>>>>>>>>>>> 
>>>>>>>>>>>>> parse
>>>>>>>>>>>>> 
>>>>>>>>>>>>> those data files, which was used in templates when
>>>>>>>>>>>>> transforming
>>>>>>>>>>>>> 
>>>>>>>>>>>>> generator
>>>>>>>>>>>>> 
>>>>>>>>>>>>> files. So we need a common format for specifying how to load
>>>>>>>>>>>>> data
>>>>>>>>>>>>> 
>>>>>>>>>>>>> files.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> That's maybe just FTL that #assigns to the variables, or
>>>>>>>>>>>>> maybe
>>>> more
>>>>>>>>>>>>> declarative format.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> What I have described in the original post here was a less
>>>> generic
>>>>>>>>>>>>> form
>>>>>>>>>>>>> 
>>>>>>>>>>>>> of
>>>>>>>>>>>>> 
>>>>>>>>>>>>> this, as I tried to be true with the original approach. I
>>>>>>>>>>>>> though
>>>>>> the
>>>>>>>>>>>>> proposal will be drastic enough as it is... :) There, the
>>>>>>>>>>>>> "main"
>>>>>>>>>>>>> document
>>>>>>>>>>>>> is the "generator file" from point 1, the "-t" template is
>>>>>>>>>>>>> the
>>>>>>>>>>>>> transform
>>>>>>>>>>>>> template for the "main" document, and the other named
>>>>>>>>>>>>> documents
>>>>>>>>>>>>> ("users",
>>>>>>>>>>>>> "groups") is a poor man's shared data-model from point 2
>>>> (together
>>>>>>>>>>>>> with
>>>>>>>>>>>>> with -PName=value).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> There's further somewhat confusing thing to get right with
>>>>>>>>>>>>> the
>>>>>>>>>>>>> list-of-documents (`DocuentList`, `NamedDocumentLists`)
>>>>>>>>>>>>> thing
>>>>>> though.
>>>>>>>>>>>>> In
>>>>>>>>>>>>> the model above, as per point 1, if you list multiple data
>>>>>>>>>>>>> files,
>>>>>>>>>>>>> each
>>>>>>>>>>>>> 
>>>>>>>>>>>>> will
>>>>>>>>>>>>> 
>>>>>>>>>>>>> generate a separate output file. So, if you need take in a
>>>>>>>>>>>>> list
>>>> of
>>>>>>>>>>>>> files
>>>>>>>>>>>>> 
>>>>>>>>>>>>> to
>>>>>>>>>>>>> 
>>>>>>>>>>>>> transform it to a single output file (or at least with a
>>>>>>>>>>>>> single
>>>>>>>>>>>>> transform
>>>>>>>>>>>>> template execution), then you have to be explicit about
>>>>>>>>>>>>> that, as
>>>>>>>>>>>>> that's
>>>>>>>>>>>>> 
>>>>>>>>>>>>> not
>>>>>>>>>>>>> 
>>>>>>>>>>>>> the default behavior anymore. But it's still absolutely
>>>>>>>>>>>>> possible.
>>>>>>>>>>>>> Imagine
>>>>>>>>>>>>> it as a "list of XLSX-es" is itself like a file format. You
>>>>>>>>>>>>> need
>>>>>> some
>>>>>>>>>>>>> CLI
>>>>>>>>>>>>> (and Maven config, etc.) syntax to express that, but that
>>>> shouldn't
>>>>>>>>>>>>> be a
>>>>>>>>>>>>> big deal.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Sun, Feb 23, 2020 at 9:43 PM Siegfried Goeschl <
>>>>>>>>>>>>> siegfried.goeschl@gmail.com> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Good timing - I was looking at a similar problem from
>>>>>>>>>>>>> different
>>>>>> angle
>>>>>>>>>>>>> yesterday (see below)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Don't have enough time to answer your email in detail now -
>>>>>>>>>>>>> will
>>>> do
>>>>>>>>>>>>> that
>>>>>>>>>>>>> tomorrow evening
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Siegfried Goeschl
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ===. START
>>>>>>>>>>>>> # FreeMarker CLI Improvement
>>>>>>>>>>>>> ## Support Of Multiple Template Files
>>>>>>>>>>>>> Currently we support the following combinations
>>>>>>>>>>>>> 
>>>>>>>>>>>>> * Single template and no data files
>>>>>>>>>>>>> * Single template and one or more data files
>>>>>>>>>>>>> 
>>>>>>>>>>>>> But we can not support the following use case which is quite
>>>>>> typical
>>>>>>>>>>>>> in
>>>>>>>>>>>>> the cloud
>>>>>>>>>>>>> 
>>>>>>>>>>>>> __Convert multiple templates with a single data file, e.g
>>>> copying a
>>>>>>>>>>>>> directory of configuration files using a JSON configuration
>>>> file__
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ## Implementation notes
>>>>>>>>>>>>> * When we copy a directory we can remove the `ftl`extension
>>>>>>>>>>>>> on
>>>> the
>>>>>>>>>>>>> fly
>>>>>>>>>>>>> * We might need an `exclude` filter for the copy operation
>>>>>>>>>>>>> * Initially resolve to a list of template files and process
>>>>>>>>>>>>> one
>>>>>> after
>>>>>>>>>>>>> another
>>>>>>>>>>>>> * Need to calculate the output file location and extension
>>>>>>>>>>>>> * We need to rename the existing command line parameters
>>>>>>>>>>>>> (see
>>>>>> below)
>>>>>>>>>>>>> * Do we need multiple include and exclude filter?
>>>>>>>>>>>>> * Do we need file versus directory filters?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ### Command Line Options
>>>>>>>>>>>>> ```
>>>>>>>>>>>>> --input-encoding : Encoding of the documents
>>>>>>>>>>>>> --output-encoding : Encoding of the rendered template
>>>>>>>>>>>>> --template-encoding : Encoding of the template
>>>>>>>>>>>>> --output : Output file or directory
>>>>>>>>>>>>> --include-document : Include pattern for documents
>>>>>>>>>>>>> --exclude-document : Exclude pattern for documents
>>>>>>>>>>>>> --include-template: Include pattern for templates
>>>>>>>>>>>>> --exclude-template : Exclude pattern for templates
>>>>>>>>>>>>> ```
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ### Command Line Examples
>>>>>>>>>>>>> ```text
>>>>>>>>>>>>> # Copy all FTL templates found in "ext/config" to the
>>>>>>>>>>>>> "/config"
>>>>>>>>>>>>> 
>>>>>>>>>>>>> directory
>>>>>>>>>>>>> 
>>>>>>>>>>>>> using the data from "config.json"
>>>>>>>>>>>>> 
>>>>>>>>>>>>> freemarker-cli -t ./ext/config --include-template *.ftl --o
>>>> /config
>>>>>>>>>>>>> 
>>>>>>>>>>>>> config.json
>>>>>>>>>>>>> 
>>>>>>>>>>>>> freemarker-cli --template ./ext/config --include-template
>>>>>>>>>>>>> *.ftl
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --output
>>>>>>>>>>>>> 
>>>>>>>>>>>>> /config config.json
>>>>>>>>>>>>> 
>>>>>>>>>>>>> # Bascically the same using a named document "configuration"
>>>>>>>>>>>>> # It might make sense to expose "conf" directly in the
>>>>>>>>>>>>> FreeMarker
>>>>>>>>>>>>> data
>>>>>>>>>>>>> model
>>>>>>>>>>>>> # It might make sens to allow URIs for loading documents
>>>>>>>>>>>>> 
>>>>>>>>>>>>> freemarker-cli -t ./ext/config/*.ftl -o /config -d
>>>>>>>>>>>>> 
>>>>>>>>>>>>> configuration=config.json
>>>>>>>>>>>>> 
>>>>>>>>>>>>> freemarker-cli --template ./ext/config --include-template
>>>>>>>>>>>>> *.ftl
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --output
>>>>>>>>>>>>> 
>>>>>>>>>>>>> /config --document configuration=config.json
>>>>>>>>>>>>> 
>>>>>>>>>>>>> freemarker-cli --template ./ext/config --include-template
>>>>>>>>>>>>> *.ftl
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --output
>>>>>>>>>>>>> 
>>>>>>>>>>>>> /config --document configuration=file:///config.json
>>>>>>>>>>>>> 
>>>>>>>>>>>>> # Bascically the same using an environment variable as named
>>>>>> document
>>>>>>>>>>>>> 
>>>>>>>>>>>>> freemarker-cli -t ./ext/config --include-template *.ftl -o
>>>> /config
>>>>>> -d
>>>>>>>>>>>>> 
>>>>>>>>>>>>> configuration=env:///CONFIGURATION
>>>>>>>>>>>>> 
>>>>>>>>>>>>> freemarker-cli --template ./ext/config --include-template
>>>>>>>>>>>>> *.ftl
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --output
>>>>>>>>>>>>> 
>>>>>>>>>>>>> /config --document configuration=env:///CONFIGURATION
>>>>>>>>>>>>> ```
>>>>>>>>>>>>> === END
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 23.02.2020, at 16:37, Daniel Dekany <dd...@apache.org>
>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Input documents is a fundamental concept in
>>>>>>>>>>>>> freemarker-generator,
>>>>>> so
>>>>>>>>>>>>> we
>>>>>>>>>>>>> should think about that more, and probably refine/rework how
>>>>>>>>>>>>> it's
>>>>>>>>>>>>> done.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Currently it works like this, with CLI at least.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> freemarker-cli
>>>>>>>>>>>>> -t access-report.ftl
>>>>>>>>>>>>> somewhere/foo-access-log.csv
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Then in access-report.ftl you have to do something like
>>>>>>>>>>>>> this:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> <#assign doc = Documents.get(0)>
>>>>>>>>>>>>> ... process doc here
>>>>>>>>>>>>> 
>>>>>>>>>>>>> (The more idiomatic Documents[0] won't work. Actually, that
>>>>>>>>>>>>> lead
>>>>>> to a
>>>>>>>>>>>>> 
>>>>>>>>>>>>> funny
>>>>>>>>>>>>> 
>>>>>>>>>>>>> chain of coincidences: It returned the string "D", then
>>>>>>>>>>>>> 
>>>>>>>>>>>>> CSVTool.parse(...)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> happily parsed that to a table with the single column "D",
>>>>>>>>>>>>> and 0
>>>>>>>>>>>>> rows,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> and
>>>>>>>>>>>>> 
>>>>>>>>>>>>> as there were 0 rows, the template didn't run into an error
>>>> because
>>>>>>>>>>>>> row.myExpectedColumn refers to a missing column either, so
>>>>>>>>>>>>> the
>>>>>>>>>>>>> process
>>>>>>>>>>>>> finished with success. (: Pretty unlucky for sure. The root
>>>>>>>>>>>>> was
>>>>>>>>>>>>> unintentionally breaking a FreeMarker idiom though;
>>>>>>>>>>>>> eventually we
>>>>>>>>>>>>> will
>>>>>>>>>>>>> 
>>>>>>>>>>>>> have
>>>>>>>>>>>>> 
>>>>>>>>>>>>> to work on those too, but, different topic.)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> However, actually multiple input documents can be passed in:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> freemarker-cli
>>>>>>>>>>>>> -t access-report.ftl
>>>>>>>>>>>>> somewhere/foo-access-log.csv
>>>>>>>>>>>>> somewhere/bar-access-log.csv
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Above template will still work, though then you ignored all
>>>>>>>>>>>>> but
>>>> the
>>>>>>>>>>>>> 
>>>>>>>>>>>>> first
>>>>>>>>>>>>> 
>>>>>>>>>>>>> document. So if you expect any number of input documents,
>>>>>>>>>>>>> you
>>>>>>>>>>>>> probably
>>>>>>>>>>>>> 
>>>>>>>>>>>>> will
>>>>>>>>>>>>> 
>>>>>>>>>>>>> have to do this:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> <#list Documents.list as doc>
>>>>>>>>>>>>> ... process doc here
>>>>>>>>>>>>> </#list>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> (The more idiomatic <#list Documents as doc> won't work; but
>>>> again,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> those
>>>>>>>>>>>>> 
>>>>>>>>>>>>> we will work out in a different thread.)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> So, what would be better, in my opinion. I start out from
>>>>>>>>>>>>> what I
>>>>>>>>>>>>> think
>>>>>>>>>>>>> 
>>>>>>>>>>>>> are
>>>>>>>>>>>>> 
>>>>>>>>>>>>> the common uses cases, in decreasing order of frequency.
>>>>>>>>>>>>> Goal is
>>>> to
>>>>>>>>>>>>> 
>>>>>>>>>>>>> make
>>>>>>>>>>>>> 
>>>>>>>>>>>>> those less error prone for the users, and simpler to
>>>>>>>>>>>>> express.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> USE CASE 1
>>>>>>>>>>>>> 
>>>>>>>>>>>>> You have exactly 1 input documents, which is therefore
>>>>>>>>>>>>> simply
>>>> "the"
>>>>>>>>>>>>> document in the mind of the user. This is probably the
>>>>>>>>>>>>> typical
>>>> use
>>>>>>>>>>>>> 
>>>>>>>>>>>>> case,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> but at least the use case users typically start out from
>>>>>>>>>>>>> when
>>>>>>>>>>>>> starting
>>>>>>>>>>>>> 
>>>>>>>>>>>>> the
>>>>>>>>>>>>> 
>>>>>>>>>>>>> work.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> freemarker-cli
>>>>>>>>>>>>> -t access-report.ftl
>>>>>>>>>>>>> somewhere/foo-access-log.csv
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Then `Documents.get(0)` is not very fitting. Most
>>>>>>>>>>>>> importantly
>>>> it's
>>>>>>>>>>>>> 
>>>>>>>>>>>>> error
>>>>>>>>>>>>> 
>>>>>>>>>>>>> prone, because if the user passed in more than 1 documents
>>>>>>>>>>>>> (can
>>>>>> even
>>>>>>>>>>>>> 
>>>>>>>>>>>>> happen
>>>>>>>>>>>>> 
>>>>>>>>>>>>> totally accidentally, like if the user was lazy and used a
>>>> wildcard
>>>>>>>>>>>>> 
>>>>>>>>>>>>> that
>>>>>>>>>>>>> 
>>>>>>>>>>>>> the shell exploded), the template will silently ignore the
>>>>>>>>>>>>> rest
>>>> of
>>>>>>>>>>>>> the
>>>>>>>>>>>>> documents, and the singe document processed will be
>>>>>>>>>>>>> practically
>>>>>>>>>>>>> picked
>>>>>>>>>>>>> randomly. The user might won't notice that and submits a bad
>>>> report
>>>>>>>>>>>>> or
>>>>>>>>>>>>> 
>>>>>>>>>>>>> such.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I think that in this use case the document should be simply
>>>>>> referred
>>>>>>>>>>>>> as
>>>>>>>>>>>>> `Document` in the template. When you have multiple documents
>>>> there,
>>>>>>>>>>>>> referring to `Document` should be an error, saying that the
>>>>>> template
>>>>>>>>>>>>> 
>>>>>>>>>>>>> was
>>>>>>>>>>>>> 
>>>>>>>>>>>>> made to process a single document only.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> USE CASE 2
>>>>>>>>>>>>> 
>>>>>>>>>>>>> You have multiple input documents, but each has different
>>>>>>>>>>>>> role
>>>>>>>>>>>>> 
>>>>>>>>>>>>> (different
>>>>>>>>>>>>> 
>>>>>>>>>>>>> schema, maybe different file type). Like, you pass in
>>>>>>>>>>>>> users.csv
>>>> and
>>>>>>>>>>>>> groups.csv. Each has difference schema, and so you want to
>>>>>>>>>>>>> access
>>>>>>>>>>>>> them
>>>>>>>>>>>>> differently, but in the same template.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> freemarker-cli
>>>>>>>>>>>>> [...]
>>>>>>>>>>>>> --named-document users somewhere/foo-users.csv
>>>>>>>>>>>>> --named-document groups somewhere/foo-groups.csv
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Then in the template you could refer to them as:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> `NamedDocuments.users`,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> and `NamedDocuments.groups`.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Use Case 1, and 2 can be unified into a coherent concept,
>>>>>>>>>>>>> where
>>>>>>>>>>>>> 
>>>>>>>>>>>>> `Document`
>>>>>>>>>>>>> 
>>>>>>>>>>>>> is just a shorthand for `NamedDocuments.main`. It's called
>>>>>>>>>>>>> "main"
>>>>>>>>>>>>> 
>>>>>>>>>>>>> because
>>>>>>>>>>>>> 
>>>>>>>>>>>>> that's "the" document the template is about, but then you
>>>>>>>>>>>>> have to
>>>>>>>>>>>>> added
>>>>>>>>>>>>> some helper documents, with symbolic names representing
>>>>>>>>>>>>> their
>>>> role.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> freemarker-cli
>>>>>>>>>>>>> -t access-report.ftl
>>>>>>>>>>>>> --document-name=main somewhere/foo-access-log.csv
>>>>>>>>>>>>> --document-name=users somewhere/foo-users.csv
>>>>>>>>>>>>> --document-name=groups somewhere/foo-groups.csv
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Here, `Document` still works in the template, and it refers
>>>>>>>>>>>>> to
>>>>>>>>>>>>> `somewhere/foo-access-log.csv`. (While omitting
>>>>>> --document-name=main
>>>>>>>>>>>>> 
>>>>>>>>>>>>> above
>>>>>>>>>>>>> 
>>>>>>>>>>>>> would be cleaner, I couldn't figure out how to do that with
>>>>>> Picocli.
>>>>>>>>>>>>> Anyway, for now the point is the concept, which is not
>>>>>>>>>>>>> specific
>>>> to
>>>>>>>>>>>>> 
>>>>>>>>>>>>> CLI.)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> USE CASE 3
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Here you have several of the same kind of documents. That
>>>>>>>>>>>>> has a
>>>>>> more
>>>>>>>>>>>>> generic sub-use-case, when you have explicitly named
>>>>>>>>>>>>> documents
>>>>>> (like
>>>>>>>>>>>>> "users" above), and for some you expect multiple input
>>>>>>>>>>>>> files.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> freemarker-cli
>>>>>>>>>>>>> -t access-report.ftl
>>>>>>>>>>>>> --document-name=main somewhere/foo-access-log.csv
>>>>>>>>>>>>> somewhere/bar-access-log.csv
>>>>>>>>>>>>> --document-name=users somewhere/foo-users.csv
>>>>>>>>>>>>> somewhere/bar-users.csv
>>>>>>>>>>>>> --document-name=groups somewhere/global-groups.csv
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The template must to be written with this use case in mind,
>>>>>>>>>>>>> as
>>>> now
>>>>>> it
>>>>>>>>>>>>> 
>>>>>>>>>>>>> has
>>>>>>>>>>>>> 
>>>>>>>>>>>>> #list some of the documents. (I think in practice you hardly
>>>>>>>>>>>>> ever
>>>>>>>>>>>>> want
>>>>>>>>>>>>> 
>>>>>>>>>>>>> to
>>>>>>>>>>>>> 
>>>>>>>>>>>>> get a document by hard coded index. Either you don't know
>>>>>>>>>>>>> how
>>>> many
>>>>>>>>>>>>> documents you have, so you can't use hard coded indexes, or
>>>>>>>>>>>>> you
>>>> do,
>>>>>>>>>>>>> and
>>>>>>>>>>>>> each index has a specific meaning, but then you should name
>>>>>>>>>>>>> the
>>>>>>>>>>>>> 
>>>>>>>>>>>>> documents
>>>>>>>>>>>>> 
>>>>>>>>>>>>> instead, as using indexes is error prone, and hard to read.)
>>>>>>>>>>>>> Accessing that list of documents in the template, maybe
>>>>>>>>>>>>> could be
>>>>>> done
>>>>>>>>>>>>> 
>>>>>>>>>>>>> like
>>>>>>>>>>>>> 
>>>>>>>>>>>>> this:
>>>>>>>>>>>>> - For the "main" documents: `DocumentList`
>>>>>>>>>>>>> - For explicitly named documents, like "users":
>>>>>>>>>>>>> 
>>>>>>>>>>>>> `NamedDocumentLists.users`
>>>>>>>>>>>>> 
>>>>>>>>>>>>> SUMMING UP
>>>>>>>>>>>>> 
>>>>>>>>>>>>> To unify all 3 use cases into a coherent concept:
>>>>>>>>>>>>> - `NamedDocumentLists.<name>` is the most generic form, and
>>>>>>>>>>>>> while
>>>>>> you
>>>>>>>>>>>>> 
>>>>>>>>>>>>> can
>>>>>>>>>>>>> 
>>>>>>>>>>>>> achieve everything with it, using it requires your template
>>>>>>>>>>>>> to
>>>>>> handle
>>>>>>>>>>>>> 
>>>>>>>>>>>>> the
>>>>>>>>>>>>> 
>>>>>>>>>>>>> most generic case too. So, I think it would be rarely used.
>>>>>>>>>>>>> - `DocumentList` is just a shorthand for
>>>> `NamedDocumentLists.main`.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> It's
>>>>>>>>>>>>> 
>>>>>>>>>>>>> used if you only have one kind of documents (single format
>>>>>>>>>>>>> and
>>>>>>>>>>>>> schema),
>>>>>>>>>>>>> 
>>>>>>>>>>>>> but
>>>>>>>>>>>>> 
>>>>>>>>>>>>> potentially multiple of them.
>>>>>>>>>>>>> - `NamedDocuments.<name>` expresses that you expect exactly
>>>>>>>>>>>>> 1
>>>>>>>>>>>>> document
>>>>>>>>>>>>> 
>>>>>>>>>>>>> of
>>>>>>>>>>>>> 
>>>>>>>>>>>>> the given name.
>>>>>>>>>>>>> - `Document` is just a shorthand for `NamedDocuments.main`.
>>>>>>>>>>>>> This
>>>> is
>>>>>>>>>>>>> for
>>>>>>>>>>>>> 
>>>>>>>>>>>>> the
>>>>>>>>>>>>> 
>>>>>>>>>>>>> most natural/frequent use case.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> That's 4 possible ways of accessing your documents, which is
>>>>>>>>>>>>> a
>>>>>>>>>>>>> 
>>>>>>>>>>>>> trade-off
>>>>>>>>>>>>> 
>>>>>>>>>>>>> for the sake of these:
>>>>>>>>>>>>> - Catching CLI (or Maven, etc.) input where the template
>>>>>>>>>>>>> output
>>>>>>>>>>>>> likely
>>>>>>>>>>>>> 
>>>>>>>>>>>>> will
>>>>>>>>>>>>> 
>>>>>>>>>>>>> be wrong. That's only possible if the user can communicate
>>>>>>>>>>>>> its
>>>>>> intent
>>>>>>>>>>>>> 
>>>>>>>>>>>>> in
>>>>>>>>>>>>> 
>>>>>>>>>>>>> the template.
>>>>>>>>>>>>> - Users don't need to deal with concepts that are irrelevant
>>>>>>>>>>>>> in
>>>>>> their
>>>>>>>>>>>>> concrete use case. Just start with the trivial, `Document`,
>>>>>>>>>>>>> and
>>>>>> later
>>>>>>>>>>>>> 
>>>>>>>>>>>>> if
>>>>>>>>>>>>> 
>>>>>>>>>>>>> the need arises, generalize to named documents, document
>>>>>>>>>>>>> lists,
>>>> or
>>>>>>>>>>>>> 
>>>>>>>>>>>>> both.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> What do guys think?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> Best regards,
>>>>>>>>>> Daniel Dekany
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Best regards,
>>>>>>>>> Daniel Dekany
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Best regards,
>>>>>>> Daniel Dekany
>>>>>> 
>>>>>> 
>>>>> 
>>>>> --
>>>>> Best regards,
>>>>> Daniel Dekany
>>>> 
>>>> 
>>>> 
>>> 
>>> --
>>> Best regards,
>>> Daniel Dekany
>> 
> 
> 
> -- 
> Best regards,
> Daniel Dekany

Re: freemarker-generator: Improving the input documents concept

Posted by Daniel Dekany <da...@gmail.com>.

>
> Regarding the "global mode" and "output generators files" - I'm sorry, but
> I'm not getting it


I'm not getting what doesn't go though. Can you explain?  The CLI suggested
that you got "global mode" (a single --mode switch per run).

Do you think of defining explicit "output generator file" containing
> `datasources, `templates` and `outputs` - yes that could be done but
> does not feel like an interactive command line tool any longer


I think what the CLI exposes and how should be a secondary detail at this
phase, as the CLI is (or should be) just a front end, that wraps the common
core (genertor.base). The CLI, the Maven task, Gradle task, etc. should
probably just be thin wrappers around the common core. Do we agree on that?
So, these concepts are "core" concepts, and probably govern the API of
generator.base. That's was my intent here, to hammer out these core
concepts.

Also the "output generator file" is usually just a data file, or just a
template. It's just the file that causes some output generated. So,usually
it doesn't *explicitly* contain all that information (though you might as
well introduce a file type that does). But it still defines an output
generator, because, you will have a template, a data-model, and an output
file name.

I think you are leaning towards a 1.0 release why I favour 0.x.y to
> have room to make mistakes / experiments


The version number doesn't tell much to me, so what's your intent/strategy
with these 0.x.y releases you plant to do? Like, if you release 0.1.0, then
will you feel inconvenient to change things *radically* after that? That
can be a problem, if the goal is iterating without bounds. On the other
hand, if you don't feel inconvenient about that at all, I don't really see
why a user would use it. But, if it's clearly indicated that everything can
change, and you think it's useful to release that way, I don't want to be
in your way.

perfect is the enemy of good


I just think the overall concept/architecture should be iterated out first.
Polish, and adding all kind of bells, even fixing bugs, is different matter.

On Thu, Mar 5, 2020 at 9:36 PM Siegfried Goeschl <
siegfried.goeschl@gmail.com> wrote:

> Hi Daniel,
>
> The introduction of named `Datasource` allows to simplify / streamline a
> few things
>
> * I have a meaningful user-supplied name
> * I can pass additional configuration information as already implemented
> with `charset` and `contenttype` and this would also allow configure a
> `CSV Datasource`, e.g.
> `users=./data/users.csv#format=default&header=true&delimeter=TAB` which
> can be readily parses
> * Currently the name of datasources are are taken from their relative
> file name - might make sense to drop that but I need to contemplate :-)
>
> Regarding the "global mode" and "output generators files" - I'm sorry,
> but I'm not getting it
>
> * I refined the
> https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449 to
> make my points more clearly
> * Do you think of defining explicit "output generator file" containing
> `datasources, `templates` and `outputs` - yes that could be done but
> does not feel like an interactive command line tool any longer
>
>
> Regarding "more idiomatic FTL usage"
>
> * Yes, I need to dive into custom template models or whatever it is
> called :-)
>
>
> Something we need to iron out is a release policy
>
> * Currently we have little agreement how the CLI should look like or
> behave
> * I think you are leaning towards a 1.0 release why I favour 0.x.y to
> have room to make mistakes / experiments
> * I personally see the possibility that we don't get a release out -
> "perfect is the enemy of good"
>
> How would you like to handle the problem - can we agree on minimal
> feature set worthy a release?
>
> Thanks in advance,
>
> Siegfried Goeschl
>
>
> On 1 Mar 2020, at 11:33, Daniel Dekany wrote:
>
> >>
> >> Actually not recommended but we have named data sources for less than
> >> 24
> >> hours
> >
> >
> > Sorry, not sure what that means. Anyway, my "vote" is let's not give
> > automatic names if that's not recommended to utilize. I mean, in case
> > we
> > happen to agree on that, why leave it there. Especially if
> > automatically
> > chosen names can clash with explicitly given ones, that would be a
> > trouble.  (I'm not sure right now if they can... the path we use as
> > the
> > name can be realtive? Then it realistically can.)
> >
> > This is a command line tool where we have little idea what the user
> > will do
> >> or abuse
> >
> >
> > No matter how much/little we know, we firmly put our bets by releasing
> > something. So if some feature is certainly not right, that's enough to
> > not
> > have it, I think.
> >
> > How does a "data loader" knows that it is responsible to load a file
> >
> > What should as "CSV data loader" should do - parse it into a list of
> >> records or stream one by one?
> >
> >
> > I think I was misunderstood here. It's not about some kind of
> > auto-magic.
> > It's about where do you specify what to load and how, and in what
> > format do
> > you specify that. Of course, you must specify the data source
> > (basically an
> > URI for now as I saw), the rough format (CSV), and the format options
> > (separator character, etc.), and other freemarker-generator loading
> > options
> > (like which CSV columns are numbers, which are dates, with what
> > format,
> > what counts as null, etc.).
> >
> > What was confusing in what I said much earlier is probably that you
> > don't
> > need a global "--mode". That just means that you can have multiple
> > "modes"
> > in the same run, not that you need some big auto-magic. And that they
> > aren't really "modes" then... I think it's just natural that you can
> > have
> > different kind of "output generator" files in the same run. Why force
> > the
> > assumption that you don't, especially considering that they will might
> > want
> > to access common data (which you don't want to load again and again,
> > for
> > each run of the different --mode-s you need). Of course, as you might
> > select files with wildcards (or by specifying a whole directory, or
> > with
> > some Maven matcher), you just can't directly associate the data loader
> > options to the individual data sources. Instead you can say elsewhere
> > that
> > *.csv inside this explicit "group", or with this file name pattern, is
> > to
> > be loaded like this. That's what you might perceived as auto-magic.
> > It's
> > just mass-producing data loaders for "cattle" files.
> >
> > How to handle the case if you have multiple potential data loaders for
> > a
> >> single file?
> >
> >
> > As per above, that's just two data loaders referring to the same data
> > source, so, nothing special.
> >
> > As of the current state of things, this is how I'm supposed to load a
> > CSV,
> > in the template itself (if I'm not outdated/mistaken):
> >
> > <#assign cvsFormat = CSVTool.formats.DEFAULT.withHeader()>
> > <#assign foos = CSVTool.parse(Datasources.get("foos"),
> > cvsFormat).records>
> > <#assign bars = CSVTool.parse(Datasources.get("barb"),
> > cvsFormat).records>
> >
> > It will worth exploring how to make these look more "idiomatic" FTL
> > (given
> > this is an "official" FM product now, I think, we should show how it's
> > done), and nicer in general. Point for now is, that's basically two
> > data-loaders interwoven with the template there. Because they are
> > interwoven like that, you can't reuse what they loaded for another
> > template
> > execution.
> >
> > That's comes down to personal preferences, e.g. chown uses
> > "owner[:group] "
> >
> >
> > Yeah, but XML namespaces, Java, C, etc. all use
> > <parent><operator><child>,
> > so, I think, that clicks for more of our potential users. So let's bet
> > on
> > what clicks for more users.
> >
> > Besides, I challenged the very idea that we need both groups and
> > names. :)
> > Saying that it's simpler and less opinioned (more flexible) to have
> > just
> > multiple names (like tags). What's the end of that?
> >
> > On Sun, Mar 1, 2020 at 9:47 AM Siegfried Goeschl <
> > siegfried.goeschl@gmail.com> wrote:
> >
> >> HI Daniel,
> >>
> >> Please see my comments below
> >>
> >> Thanks in advance,
> >>
> >> Siegfried Goeschl
> >>
> >>
> >>> On 29.02.2020, at 21:02, Daniel Dekany <da...@gmail.com>
> >>> wrote:
> >>>
> >>>>
> >>>> I try to provide a useful name even when the content is coming from
> >>>> an
> >>>> URL
> >>>
> >>>
> >>> When is it recommended to rely on that though? Because utilizing
> >>> that
> >> means
> >>> that renaming a data source file can break the process, even if you
> >>> call
> >>> freemarker-cli with the up to date file name. And if that happens
> >>> depends
> >>> on what you (or an other random colleague!) have dug inside the
> >> templates.
> >>> So I guess we better just don't support this. Less code and less
> >>> things
> >> to
> >>> document too.
> >>>
> >>
> >> Actually not recommended but we have named data sources for less than
> >> 24
> >> hours
> >>
> >>>
> >>>> I think we have a different understanding what a "Document" /
> >> "Datasource
> >>>> / DataSource" should do
> >>>
> >>>
> >>> Thing is, eventually (most certainly pre-1.0, as it influences
> >>> architecture), certain needs will have to addressed, somehow. Then
> >>> we
> >> will
> >>> see what "things" we really need. For now I though we need "things"
> >>> that
> >>> are much more than paths, and encapsulate the "how to load the data"
> >>> aspect. I called them data sources, but maybe we should called them
> >>> "data
> >>> loaders" to free up data sources for the more primitive thing. Some
> >>> needs/doubts to address, *later*: Is it really the best approach for
> >> users
> >>> to load/parse data sources programmatically (that coded is written
> >>> in
> >> FTL,
> >>> inside the templates)? Also, is the template the right place for
> >>> doing
> >>> that, because, when multiple templates (or just multiple template
> >>> *runs*
> >> of
> >>> the same template, each generating a different output file) needs
> >>> common
> >>> data, they shouldn't load it again and again. Also, different topic,
> >>> can
> >> we
> >>> handle the case "transparently" enough when the data is not coming
> >>> from a
> >>> file?
> >>
> >> This is a command line tool where we have little idea what the user
> >> will
> >> do or abuse
> >>
> >> * How does a "data loader" knows that it is responsible to load a
> >> file
> >> * What should as "CSV data loader" should do - parse it into a list
> >> of
> >> records or stream one by one?
> >> * How to handle the case if you have multiple potential data loaders
> >> for a
> >> single file?
> >>
> >> I'm leaning towards building blocks where the user controls the work
> >> to be
> >> done even it requires one to two extra lines of FTL code
> >>
> >>
> >>>
> >>> The joy of programming - I did not intend to use "name:group"
> >>> together
> >> with
> >>>> wildcards :-)
> >>>
> >>>
> >>> For a CLI tool, I guess we agree that it should work. So maybe, like
> >>> this
> >>> (here logs and foos meant to be "groups"):
> >>> --data-source logs file1.log file2.log fileN.log   --data-source
> >>> foos
> >>> foo1.csv foo2.csv fooN.csv  --data-source bar bar.xlsx
> >>>
> >>> It so happens that here you don't really have a good control about
> >>> the
> >>> number of files associated to the name, so, maybe yet another reason
> >>> to
> >> not
> >>> differentiate names and groups.
> >>>
> >>> I Disagree here - I think using a name would be used more often. I
> >>> added
> >>>> the "group" as an afterthought since some grouping could be useful
> >>>
> >>>
> >>> We do agree in that. What I said is that the *syntax* should be so
> >>> that
> >> the
> >>> group comes first. It's still optional. Like this:
> >>> --data-source group:name /somewhere
> >>> --data-source name /somewhere
> >>
> >> That's comes down to personal preferences, e.g. chown uses
> >> "owner[:group] "
> >>
> >>>
> >>> On Sat, Feb 29, 2020 at 7:34 PM Siegfried Goeschl <
> >>> siegfried.goeschl@gmail.com> wrote:
> >>>
> >>>> HI Daniel,
> >>>>
> >>>> Seem my comments below
> >>>>
> >>>> Thanks in advance,
> >>>>
> >>>> Siegfried Goeschl
> >>>>
> >>>>
> >>>>> On 29.02.2020, at 19:08, Daniel Dekany <da...@gmail.com>
> >> wrote:
> >>>>>
> >>>>> FREEMARKER-135 freemarker-generator-cli: Support user-supplied
> >>>>> names
> >> for
> >>>>> datasources
> >>>>>
> >>>>> So, I can do this to have both a name an a group associated to a
> >>>>> data
> >>>>> source:
> >>>>> --datasource someName:someGroup=somewhere/something
> >>>>
> >>>> Correct
> >>>>
> >>>>> Or if I only want a name, but not a group (or an ""  group
> >>>>> actually -
> >>>>> bug?), then:
> >>>>> --datasource someName=somewhere/something
> >>>>
> >>>> Correct
> >>>>
> >>>>>
> >>>>> Or if only a group but not a name (or a "" name actually) then:
> >>>>> --datasource :someGroup=somewhere/something
> >>>>
> >>>> Mhmm, that would be unintended functionality from my side - current
> >>>> approach is that every "Document" / "Datasource / DataSource" is
> >>>> named
> >>>>
> >>>>>
> >>>>> A name must identify exactly 1 data source, while a group
> >>>>> identifies a
> >>>> list
> >>>>> of data sources.
> >>>>
> >>>> No, every "Document" / "Datasource / DataSource" has a name
> >>>> currently
> >> but
> >>>> uniqueness is not enforced. Only if you want to get a "Document" /
> >>>> "Datasource / DataSource" with it's exact name I checked for
> >>>> exactly one
> >>>> search hit and throw an exception. I try to provide a useful name
> >>>> even
> >> when
> >>>> the content is coming from an URL or STDIN (and I will probably add
> >>>> environment variables as "Document" / "Datasource / DataSource",
> >>>> e.g
> >>>> configuration in the cloud as JSON content passed as environment
> >> variable)
> >>>>
> >>>>>
> >>>>> Is that this idea, that the a data source can be part of a group,
> >>>>> and
> >>>> then
> >>>>> is also possibly identifiable with a name comes from an use case?
> >>>>> I
> >> mean,
> >>>>> it's possibly important somewhere, but if so, then it's strange
> >>>>> that
> >> you
> >>>>> can put something into only a single group. If we need this kind
> >>>>> of
> >>>> thing,
> >>>>> then perhaps you should be just allowed to associate the data
> >>>>> source
> >>>> with a
> >>>>> list of names (kind of like tagging), and then when the template
> >>>>> wants
> >> to
> >>>>> get something by name, it will tell there if it expects exactly
> >>>>> one or
> >> a
> >>>>> list of data sources. Then you don't need to introduce two terms
> >>>>> in the
> >>>>> documentation either (names and groups). Again, if we want this at
> >>>>> all,
> >>>>> instead of just going with a data source that itself gives a list.
> >>>>> (And
> >>>> if
> >>>>> not, how will we handle a data source that loads from a non-file
> >> source?)
> >>>>
> >>>> I actually thought of implementing tagging but considered a "group"
> >>>> sufficient.
> >>>>
> >>>> * If you don't define anything everything goes into the "default"
> >>>> group
> >>>> * For individual documents you can define a name and an optional
> >>>> group
> >>>>
> >>>> I think we have a different understanding what a "Document" /
> >> "Datasource
> >>>> / DataSource" should do
> >>>>
> >>>> * It is a dumb
> >>>> * It is lazy since data is only loaded on demand
> >>>> * There is no automagic like "oh, this is a JSON file, so let's go
> >>>> to
> >> the
> >>>> JSON tool and create a map readily accessible in the data model"
> >>>>
> >>>>>
> >>>>> Note that the current command line syntax doesn't work well with
> >>>>> shell
> >>>>> wildcard expansion. Like this:
> >>>>> --datasource :someGroup=logs/*.log
> >>>>> will try to expand ":someGroup=logs/*.log", and because it finds
> >> nothing
> >>>>> (and because the rules of sh and the like is a mess), you will get
> >>>>> the
> >>>>> parameter value as is, without * expanded.
> >>>>
> >>>> The joy of programming - I did not intend to use "name:group"
> >>>> together
> >>>> with wildcards :-)
> >>>>
> >>>>>
> >>>>> Also,  I think the syntax with colon should be flipped, because on
> >> other
> >>>>> places foo:bar usually means that foo is the bigger unit (the
> >> container),
> >>>>> and bar is the smaller unit (the child).
> >>>>
> >>>> I Disagree here - I think using a name would be used more often. I
> >>>> added
> >>>> the "group" as an afterthought since some grouping could be useful
> >>>>
> >>>>>
> >>>>> On Sat, Feb 29, 2020 at 5:03 PM Siegfried Goeschl <
> >>>>> siegfried.goeschl@gmail.com> wrote:
> >>>>>
> >>>>>> Hi Daniel,
> >>>>>>
> >>>>>> I'm an enterprise developer - bad habits die hard :-)
> >>>>>>
> >>>>>> So I closed the following tickets and merged the branches
> >>>>>>
> >>>>>> 1) FREEMARKER-129 freemarker-generator: Merge "freemarker-cli"
> >>>>>> into
> >>>>>> "freemarker-generator"
> >>>>>> 2) FREEMARKER-134 freemarker-generator: Rename "Document" to
> >>>> "Datasource"
> >>>>>> 3) FREEMARKER-135 freemarker-generator-cli: Support user-supplied
> >> names
> >>>>>> for datasources
> >>>>>>
> >>>>>> Thanks in advance,
> >>>>>>
> >>>>>> Siegfried Goeschl
> >>>>>>
> >>>>>>
> >>>>>>> On 29.02.2020, at 12:19, Daniel Dekany <da...@gmail.com>
> >>>> wrote:
> >>>>>>>
> >>>>>>> Yeah, and of course, you can merge that branch. You can even
> >>>>>>> work on
> >>>> the
> >>>>>>> master directly after all.
> >>>>>>>
> >>>>>>> On Sat, Feb 29, 2020 at 12:17 PM Daniel Dekany <
> >>>> daniel.dekany@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> But, I do recognize the cattle use case (several "faceless"
> >>>>>>>> files
> >> with
> >>>>>>>> common format/schema). Only, my idea is to push that complexity
> >>>>>>>> on
> >> the
> >>>>>> data
> >>>>>>>> source. The "data source" concept shields the rest of the
> >> application
> >>>>>> from
> >>>>>>>> the details of how the data is stored or retrieved. So, a data
> >> source
> >>>>>> might
> >>>>>>>> loads a bunch of log files from a directory, and present them
> >>>>>>>> as a
> >>>>>> single
> >>>>>>>> big table, or like a list of tables, etc. So I want to deal
> >>>>>>>> with the
> >>>>>> cattle
> >>>>>>>> use case, but the question is what part of the of architecture
> >>>>>>>> will
> >>>> deal
> >>>>>>>> with this complication, with other words, how do you box
> >>>>>>>> things. Why
> >>>> my
> >>>>>>>> initial bet is to stuff that complication into the "data
> >>>>>>>> source"
> >>>>>>>> implementation(s) is that data sources are inherently varied.
> >>>>>>>> Some
> >>>>>> returns
> >>>>>>>> a table-like thing, some have multiple named tables (worksheets
> >>>>>>>> in
> >>>>>> Excel),
> >>>>>>>> some returns tree of nodes (XML), etc. So then, some might
> >>>>>>>> returns a
> >>>>>>>> list-of-list-of log records, or just a single list of
> >>>>>>>> log-records
> >> (put
> >>>>>>>> together from daily log files). That way cattles don't add to
> >>>> conceptual
> >>>>>>>> complexity. Now, you might be aware of cases where the cattle
> >> concept
> >>>>>> must
> >>>>>>>> be more exposed than this, and the we can't box things like
> >>>>>>>> this.
> >> But
> >>>>>> this
> >>>>>>>> is what I tried to express.
> >>>>>>>>
> >>>>>>>> Regarding "output generators", and how that applies on the
> >>>>>>>> command
> >>>>>> line. I
> >>>>>>>> think it's important that the common core between Maven and
> >>>>>> command-line is
> >>>>>>>> as fat as possible. Ideally, they are just two syntax to set up
> >>>>>>>> the
> >>>> same
> >>>>>>>> thing. Mostly at least. So, if you specify a template file to
> >>>>>>>> the
> >> CLI
> >>>>>>>> application, in a way so that it causes it to process that
> >>>>>>>> template
> >> to
> >>>>>>>> generate a single output, then there you have just defined an
> >> "output
> >>>>>>>> generator" (even if it wasn't explicitly called like that in
> >>>>>>>> the
> >>>> command
> >>>>>>>> line). If you specify 3 csv files to the CLI application, in a
> >>>>>>>> way
> >> so
> >>>>>> that
> >>>>>>>> it causes it to generate 3 output files, then you have just
> >>>>>>>> defined
> >> 3
> >>>>>>>> "output generators" there (there's at least one template
> >>>>>>>> specified
> >>>> there
> >>>>>>>> too, but that wasn't an "output generator" itself, it was just
> >>>>>>>> an
> >>>>>> attribute
> >>>>>>>> of the 3 output generators). If you specify 1 template, and 3
> >>>>>>>> csv
> >>>>>> files, in
> >>>>>>>> a way so that it will yield 4 output files (1 for the template,
> >>>>>>>> 3
> >> for
> >>>>>> the
> >>>>>>>> csv-s), then you have defined 4 output generators there. If you
> >> have a
> >>>>>> data
> >>>>>>>> source that loads a list of 3 entities (say, 3 csv files, so
> >>>>>>>> it's a
> >>>>>> list of
> >>>>>>>> tables then), and you have 2 templates, and you tell the CLI to
> >>>> execute
> >>>>>>>> each template for each item in said data source, then you have
> >>>>>>>> just
> >>>>>> defined
> >>>>>>>> 6 "output generators".
> >>>>>>>>
> >>>>>>>> On Fri, Feb 28, 2020 at 11:08 AM Siegfried Goeschl <
> >>>>>>>> siegfried.goeschl@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>>> Hi Daniel,
> >>>>>>>>>
> >>>>>>>>> That all depends on your mental model and work you do,
> >> expectations,
> >>>>>>>>> experience :-)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> __Document Handling__
> >>>>>>>>>
> >>>>>>>>> *"But I think actually we have no good use case for list of
> >> documents
> >>>>>>>>> that's passed at once to a single template run, so, we can
> >>>>>>>>> just
> >>>> ignore
> >>>>>>>>> that complication"*
> >>>>>>>>>
> >>>>>>>>> In my case that's not a complication but my daily business -
> >>>>>>>>> I'm
> >>>>>>>>> regularly wading through access logs - yesterday probably a
> >>>>>>>>> couple
> >> of
> >>>>>>>>> hundreds access logs across two staging sites to help tracking
> >>>>>>>>> some
> >>>>>>>>> strange API gateway issues :-)
> >>>>>>>>>
> >>>>>>>>> My gut feeling is (borrowing from
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>
> >>>>
> >>
> https://medium.com/@Joachim8675309/devops-concepts-pets-vs-cattle-2380b5aab313
> >>>>>>>>> )
> >>>>>>>>>
> >>>>>>>>> 1. You have a few lovely named documents / templates - `pets`
> >>>>>>>>> 2. You have tons of anonymous documents / templates to process
> >>>>>>>>> -
> >>>>>>>>> `cattle`
> >>>>>>>>> 3. The "grey area" comes into play when mixing `pets & cattle`
> >>>>>>>>>
> >>>>>>>>> `freemarker-cli` was built with 2) in mind and I want to cover
> >>>>>>>>> 1)
> >>>> since
> >>>>>>>>> it is equally important and common.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> __Template And Document Processing Modes__
> >>>>>>>>>
> >>>>>>>>> IMHO it is important to answer the following question : "How
> >>>>>>>>> many
> >>>>>>>>> outputs do you get when rendering 2 template and 3
> >>>>>>>>> datasources?
> >> Two,
> >>>>>>>>> Three or Six?"
> >>>>>>>>>
> >>>>>>>>> Your answer is influenced by your mental model / experience
> >>>>>>>>>
> >>>>>>>>> * When wading through tons of CSV files, access logs, etc. the
> >> answer
> >>>>>> is
> >>>>>>>>> "2"
> >>>>>>>>> * When doing source code generation the obvious answer is "6"
> >>>>>>>>> * Can't image a use case which results in "3" but I'm pretty
> >>>>>>>>> sure
> >> we
> >>>>>>>>> will encounter one
> >>>>>>>>>
> >>>>>>>>> __Template and document mode probably shouldn't exist__
> >>>>>>>>>
> >>>>>>>>> That's hard for me to fully understand - I definitely lack
> >>>>>>>>> your
> >>>>>> insights
> >>>>>>>>> & experience writing such tools :-)
> >>>>>>>>>
> >>>>>>>>> Defining the `Output Generator` is the underlying model for
> >>>>>>>>> the
> >> Maven
> >>>>>>>>> plugin (and probably FMPP).
> >>>>>>>>>
> >>>>>>>>> I'm not sure if this applies for command lines at least not in
> >>>>>>>>> the
> >>>> way
> >>>>>> I
> >>>>>>>>> use them (or would like to use them)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Thanks in advance,
> >>>>>>>>>
> >>>>>>>>> Siegfried Goeschl
> >>>>>>>>>
> >>>>>>>>> PS: Can/shall I merge the PR to bring in `freemarker-cli`?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 28 Feb 2020, at 9:14, Daniel Dekany wrote:
> >>>>>>>>>
> >>>>>>>>>> Yeah, "data source" is surely a too popular name, but for
> >>>>>>>>>> reason.
> >>>>>>>>>> Anyone
> >>>>>>>>>> has other ideas?
> >>>>>>>>>>
> >>>>>>>>>> As of naming data sources and such. One thing I was wondering
> >> about
> >>>>>>>>>> back
> >>>>>>>>>> then is how to deal with list of documents given to a
> >>>>>>>>>> template,
> >>>> versus
> >>>>>>>>>> exactly 1 document given to a template. But I think actually
> >>>>>>>>>> we
> >> have
> >>>>>>>>>> no
> >>>>>>>>>> good use case for list of documents that's passed at once to
> >>>>>>>>>> a
> >>>> single
> >>>>>>>>>> template run, so, we can just ignore that complication. A
> >>>>>>>>>> document
> >>>> has
> >>>>>>>>>> a
> >>>>>>>>>> name, and that's always just a single document, not a
> >>>>>>>>>> collection,
> >> as
> >>>>>>>>>> far as
> >>>>>>>>>> the template is concerned. (We can have multiple documents
> >>>>>>>>>> per
> >> run,
> >>>>>>>>>> but
> >>>>>>>>>> those normally yield separate output generators, so it's
> >>>>>>>>>> still
> >> only
> >>>>>>>>>> one
> >>>>>>>>>> document per template.) However, we can have data source
> >>>>>>>>>> types
> >>>>>>>>>> (document
> >>>>>>>>>> types with old terminology) that collect together multiple
> >>>>>>>>>> data
> >>>> files.
> >>>>>>>>>> So
> >>>>>>>>>> then that complexity is encapsulated into the data source
> >>>>>>>>>> type,
> >> and
> >>>>>>>>>> doesn't
> >>>>>>>>>> complicate the overall architecture. That's another case when
> >>>>>>>>>> a
> >> data
> >>>>>>>>>> source
> >>>>>>>>>> is not just a file. Like maybe there's a data source type
> >>>>>>>>>> that
> >> loads
> >>>>>>>>>> all
> >>>>>>>>>> the CSV-s from a directory, into a single big table (I had
> >>>>>>>>>> such
> >>>> case),
> >>>>>>>>>> or
> >>>>>>>>>> even into a list of tables. Or, as I mentioned already, a
> >>>>>>>>>> data
> >>>> source
> >>>>>>>>>> is
> >>>>>>>>>> maybe an SQL query on a JDBC data source (and we got the
> >>>>>>>>>> first
> >> term
> >>>>>>>>>> clash... JDBC also call them data sources).
> >>>>>>>>>>
> >>>>>>>>>> Template and document mode probably shouldn't exist from user
> >>>>>>>>>> perspective
> >>>>>>>>>> either, at least not as a global option that must apply to
> >>>> everything
> >>>>>>>>>> in a
> >>>>>>>>>> run. They could just give the files that define the "output
> >>>>>>>>>> generators",
> >>>>>>>>>> and some of them will be templates, some of them are data
> >>>>>>>>>> files,
> >> in
> >>>>>>>>>> which
> >>>>>>>>>> case a template need to be associated with them (and there
> >>>>>>>>>> can be
> >> a
> >>>>>>>>>> couple
> >>>>>>>>>> of ways of doing that). And then again, there are the cases
> >>>>>>>>>> where
> >>>> you
> >>>>>>>>>> want
> >>>>>>>>>> to create one output generator per entity from some data
> >>>>>>>>>> source.
> >>>>>>>>>>
> >>>>>>>>>> On Fri, Feb 28, 2020 at 8:23 AM Siegfried Goeschl <
> >>>>>>>>>> siegfried.goeschl@gmail.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi Daniel,
> >>>>>>>>>>>
> >>>>>>>>>>> See my comments below - and thanks for your patience and
> >>>>>>>>>>> input
> >> :-)
> >>>>>>>>>>>
> >>>>>>>>>>> *Renaming Document To DataSource*
> >>>>>>>>>>>
> >>>>>>>>>>> Yes, makes sense. I tried to avoid since I'm using
> >> javax.activation
> >>>>>>>>>>> and
> >>>>>>>>>>> its DataSource.
> >>>>>>>>>>>
> >>>>>>>>>>> *Template And Document Mode*
> >>>>>>>>>>>
> >>>>>>>>>>> Agreed - I think it is a valuable abstraction for the user
> >>>>>>>>>>> but it
> >>>> is
> >>>>>>>>>>> not
> >>>>>>>>>>> an implementation concept :-)
> >>>>>>>>>>>
> >>>>>>>>>>> *Document Without Symbolic Names*
> >>>>>>>>>>>
> >>>>>>>>>>> Also agreed and it is going to change but I have not settled
> >>>>>>>>>>> my
> >>>> mind
> >>>>>>>>>>> yet
> >>>>>>>>>>> what exactly to implement.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks in advance,
> >>>>>>>>>>>
> >>>>>>>>>>> Siegfried Goeschl
> >>>>>>>>>>>
> >>>>>>>>>>> On 28 Feb 2020, at 1:05, Daniel Dekany wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> A few quick thoughts on that:
> >>>>>>>>>>>
> >>>>>>>>>>> - We should replace the "document" term with something more
> >>>> speaking.
> >>>>>>>>>>> It
> >>>>>>>>>>> doesn't tell that it's some kind of input. Also, most of
> >>>>>>>>>>> these
> >>>> inputs
> >>>>>>>>>>> aren't something that people typically call documents. Like
> >>>>>>>>>>> a csv
> >>>>>>>>>>> file, or
> >>>>>>>>>>> a database table, which is not even a file (OK we don't
> >>>>>>>>>>> support
> >>>> such
> >>>>>>>>>>> thing
> >>>>>>>>>>> at the moment). I think, maybe "data source" is a safe
> >>>>>>>>>>> enough
> >> term.
> >>>>>>>>>>> (It
> >>>>>>>>>>> also rhymes with data model.)
> >>>>>>>>>>> - You have separate "template" and "document" "mode", that
> >> applies
> >>>> to
> >>>>>>>>>>> a
> >>>>>>>>>>> whole run. I think such specialization won't be helpful. We
> >>>>>>>>>>> could
> >>>>>>>>>>> just say,
> >>>>>>>>>>> on the conceptual level at lest, that we need a set of
> >>>>>>>>>>> "outputs
> >>>>>>>>>>> generators". An output generator is an object (in the API)
> >>>>>>>>>>> that
> >>>>>>>>>>> specifies a
> >>>>>>>>>>> template, a data-model (where the data-model is possibly
> >> populated
> >>>>>>>>>>> with
> >>>>>>>>>>> "documents"), and an output "sink" (a file path, or stdout),
> >>>>>>>>>>> and
> >>>> can
> >>>>>>>>>>> generate the output itself. A practical way of defining the
> >> output
> >>>>>>>>>>> generators in a CLI application is via a bunch of files,
> >>>>>>>>>>> each
> >>>>>>>>>>> defining an
> >>>>>>>>>>> output generator. Some of those files is maybe a template
> >>>>>>>>>>> (that
> >> you
> >>>>>>>>>>> can
> >>>>>>>>>>> even detect from the file extension), or a data file that we
> >>>>>>>>>>> currently call
> >>>>>>>>>>> a "document". They could freely mix inside the same run. I
> >>>>>>>>>>> have
> >>>> also
> >>>>>>>>>>> met
> >>>>>>>>>>> use case when you have a single table (single "document"),
> >>>>>>>>>>> and
> >> each
> >>>>>>>>>>> record
> >>>>>>>>>>> in it yields an output file. That can also be described in
> >>>>>>>>>>> some
> >>>> file
> >>>>>>>>>>> format, or really in any other way, like directly in command
> >>>>>>>>>>> line
> >>>>>>>>>>> argument,
> >>>>>>>>>>> via API, etc.
> >>>>>>>>>>> - You have multiple documents without associated symbolical
> >>>>>>>>>>> name
> >> in
> >>>>>>>>>>> some
> >>>>>>>>>>> examples. Templates can't identify those then in a well
> >>>> maintainable
> >>>>>>>>>>> way.
> >>>>>>>>>>> The actual file name is often not a good identifier, can
> >>>>>>>>>>> change
> >>>> over
> >>>>>>>>>>> time,
> >>>>>>>>>>> and you might don't even have good control over it, like you
> >>>> already
> >>>>>>>>>>> receive it as a parameter from somewhere else, or someone
> >>>>>>>>>>> moves/renames
> >>>>>>>>>>> that files that you need to read. Index is also not very
> >>>>>>>>>>> good,
> >> but
> >>>> I
> >>>>>>>>>>> have
> >>>>>>>>>>> written about that earlier.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, Feb 26, 2020 at 9:33 AM Siegfried Goeschl <
> >>>>>>>>>>> siegfried.goeschl@gmail.com> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Hi folks,
> >>>>>>>>>>>
> >>>>>>>>>>> still wrapping my side around but assembled some thoughts
> >>>>>>>>>>> here -
> >>>>>>>>>>>
> >> https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks in advance,
> >>>>>>>>>>>
> >>>>>>>>>>> Siegfried Goeschl
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On 23 Feb 2020, at 23:14, Daniel Dekany <dd...@apache.org>
> >>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> What you are describing is more like the angle that FMPP
> >>>>>>>>>>> took
> >>>>>>>>>>> initially,
> >>>>>>>>>>> where templates drive things, they generate the output for
> >>>> themselves
> >>>>>>>>>>>
> >>>>>>>>>>> (even
> >>>>>>>>>>>
> >>>>>>>>>>> multiple output files if they wish). By default output files
> >>>>>>>>>>> name
> >>>>>>>>>>> (and
> >>>>>>>>>>> relative path) is deduced from template name. There was also
> >>>>>>>>>>> a
> >>>> global
> >>>>>>>>>>> data-model, built in a configuration file (or equally, built
> >>>>>>>>>>> via
> >>>>>>>>>>> command
> >>>>>>>>>>> line arguments, or both mixed), from which templates get
> >>>>>>>>>>> whatever
> >>>>>>>>>>> data
> >>>>>>>>>>>
> >>>>>>>>>>> they
> >>>>>>>>>>>
> >>>>>>>>>>> are interested in. Take a look at the figures here:
> >>>>>>>>>>> http://fmpp.sourceforge.net/qtour.html. Later, this concept
> >>>>>>>>>>> was
> >>>>>>>>>>>
> >>>>>>>>>>> generalized
> >>>>>>>>>>>
> >>>>>>>>>>> a bit more, because you could add XML files at the same
> >>>>>>>>>>> place
> >> where
> >>>>>>>>>>> you
> >>>>>>>>>>> have the templates, and then you could associate transform
> >>>> templates
> >>>>>>>>>>> to
> >>>>>>>>>>>
> >>>>>>>>>>> the
> >>>>>>>>>>>
> >>>>>>>>>>> XML files (based on path pattern and/or the XML document
> >> element).
> >>>>>>>>>>> Now
> >>>>>>>>>>> that's like what freemarker-generator had initially (data
> >>>>>>>>>>> files
> >>>> drive
> >>>>>>>>>>> output, and the template is there to transform it).
> >>>>>>>>>>>
> >>>>>>>>>>> So I think the generic mental model would like this:
> >>>>>>>>>>>
> >>>>>>>>>>> 1. You got files that drive the process, let's call them
> >> *generator
> >>>>>>>>>>> files* for now. Usually, each generator file yields an
> >>>>>>>>>>> output
> >> file
> >>>>>>>>>>> (but
> >>>>>>>>>>> maybe even multiple output files, as you might saw in the
> >>>>>>>>>>> last
> >>>>>>>>>>> figure).
> >>>>>>>>>>> These generator files can be of many types, like XML, JSON,
> >>>>>>>>>>> XLSX
> >>>> (as
> >>>>>>>>>>>
> >>>>>>>>>>> in the
> >>>>>>>>>>>
> >>>>>>>>>>> original freemarker-generator), and even templates (as is
> >>>>>>>>>>> the
> >> norm
> >>>> in
> >>>>>>>>>>> FMPP). If the file is not a template, then you got a set of
> >>>>>>>>>>> transformer
> >>>>>>>>>>> templates (-t CLI option) in a separate directory, which can
> >>>>>>>>>>> be
> >>>>>>>>>>>
> >>>>>>>>>>> associated
> >>>>>>>>>>>
> >>>>>>>>>>> with the generator files base on name patterns, and even
> >>>>>>>>>>> based on
> >>>>>>>>>>>
> >>>>>>>>>>> content
> >>>>>>>>>>>
> >>>>>>>>>>> (schema usually). If the generator file is a template (so
> >>>>>>>>>>> that's
> >> a
> >>>>>>>>>>> positional @Parameter CLI argument that happens to be an
> >>>>>>>>>>> *.ftl,
> >> and
> >>>>>>>>>>> is
> >>>>>>>>>>>
> >>>>>>>>>>> not
> >>>>>>>>>>>
> >>>>>>>>>>> a template file specified after the "-t" option), then you
> >>>>>>>>>>> just
> >>>>>>>>>>> Template.process(...) it, and it prints what the output will
> >>>>>>>>>>> be.
> >>>>>>>>>>> 2. You also have a set of variables, the global data-model,
> >>>>>>>>>>> that
> >>>>>>>>>>> contains commonly useful stuff, like what you now call
> >>>>>>>>>>> parameters
> >>>>>>>>>>> (CLI
> >>>>>>>>>>> -Pname=value), but also maybe data loaded from JSON, XML,
> >>>>>>>>>>> etc..
> >>>> Those
> >>>>>>>>>>>
> >>>>>>>>>>> data
> >>>>>>>>>>>
> >>>>>>>>>>> files aren't "generator files". Templates just use them if
> >>>>>>>>>>> they
> >>>> need
> >>>>>>>>>>>
> >>>>>>>>>>> them.
> >>>>>>>>>>>
> >>>>>>>>>>> An important thing here is to reuse the same mechanism to
> >>>>>>>>>>> read
> >> and
> >>>>>>>>>>>
> >>>>>>>>>>> parse
> >>>>>>>>>>>
> >>>>>>>>>>> those data files, which was used in templates when
> >>>>>>>>>>> transforming
> >>>>>>>>>>>
> >>>>>>>>>>> generator
> >>>>>>>>>>>
> >>>>>>>>>>> files. So we need a common format for specifying how to load
> >>>>>>>>>>> data
> >>>>>>>>>>>
> >>>>>>>>>>> files.
> >>>>>>>>>>>
> >>>>>>>>>>> That's maybe just FTL that #assigns to the variables, or
> >>>>>>>>>>> maybe
> >> more
> >>>>>>>>>>> declarative format.
> >>>>>>>>>>>
> >>>>>>>>>>> What I have described in the original post here was a less
> >> generic
> >>>>>>>>>>> form
> >>>>>>>>>>>
> >>>>>>>>>>> of
> >>>>>>>>>>>
> >>>>>>>>>>> this, as I tried to be true with the original approach. I
> >>>>>>>>>>> though
> >>>> the
> >>>>>>>>>>> proposal will be drastic enough as it is... :) There, the
> >>>>>>>>>>> "main"
> >>>>>>>>>>> document
> >>>>>>>>>>> is the "generator file" from point 1, the "-t" template is
> >>>>>>>>>>> the
> >>>>>>>>>>> transform
> >>>>>>>>>>> template for the "main" document, and the other named
> >>>>>>>>>>> documents
> >>>>>>>>>>> ("users",
> >>>>>>>>>>> "groups") is a poor man's shared data-model from point 2
> >> (together
> >>>>>>>>>>> with
> >>>>>>>>>>> with -PName=value).
> >>>>>>>>>>>
> >>>>>>>>>>> There's further somewhat confusing thing to get right with
> >>>>>>>>>>> the
> >>>>>>>>>>> list-of-documents (`DocuentList`, `NamedDocumentLists`)
> >>>>>>>>>>> thing
> >>>> though.
> >>>>>>>>>>> In
> >>>>>>>>>>> the model above, as per point 1, if you list multiple data
> >>>>>>>>>>> files,
> >>>>>>>>>>> each
> >>>>>>>>>>>
> >>>>>>>>>>> will
> >>>>>>>>>>>
> >>>>>>>>>>> generate a separate output file. So, if you need take in a
> >>>>>>>>>>> list
> >> of
> >>>>>>>>>>> files
> >>>>>>>>>>>
> >>>>>>>>>>> to
> >>>>>>>>>>>
> >>>>>>>>>>> transform it to a single output file (or at least with a
> >>>>>>>>>>> single
> >>>>>>>>>>> transform
> >>>>>>>>>>> template execution), then you have to be explicit about
> >>>>>>>>>>> that, as
> >>>>>>>>>>> that's
> >>>>>>>>>>>
> >>>>>>>>>>> not
> >>>>>>>>>>>
> >>>>>>>>>>> the default behavior anymore. But it's still absolutely
> >>>>>>>>>>> possible.
> >>>>>>>>>>> Imagine
> >>>>>>>>>>> it as a "list of XLSX-es" is itself like a file format. You
> >>>>>>>>>>> need
> >>>> some
> >>>>>>>>>>> CLI
> >>>>>>>>>>> (and Maven config, etc.) syntax to express that, but that
> >> shouldn't
> >>>>>>>>>>> be a
> >>>>>>>>>>> big deal.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Sun, Feb 23, 2020 at 9:43 PM Siegfried Goeschl <
> >>>>>>>>>>> siegfried.goeschl@gmail.com> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Hi Daniel,
> >>>>>>>>>>>
> >>>>>>>>>>> Good timing - I was looking at a similar problem from
> >>>>>>>>>>> different
> >>>> angle
> >>>>>>>>>>> yesterday (see below)
> >>>>>>>>>>>
> >>>>>>>>>>> Don't have enough time to answer your email in detail now -
> >>>>>>>>>>> will
> >> do
> >>>>>>>>>>> that
> >>>>>>>>>>> tomorrow evening
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks in advance,
> >>>>>>>>>>>
> >>>>>>>>>>> Siegfried Goeschl
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> ===. START
> >>>>>>>>>>> # FreeMarker CLI Improvement
> >>>>>>>>>>> ## Support Of Multiple Template Files
> >>>>>>>>>>> Currently we support the following combinations
> >>>>>>>>>>>
> >>>>>>>>>>> * Single template and no data files
> >>>>>>>>>>> * Single template and one or more data files
> >>>>>>>>>>>
> >>>>>>>>>>> But we can not support the following use case which is quite
> >>>> typical
> >>>>>>>>>>> in
> >>>>>>>>>>> the cloud
> >>>>>>>>>>>
> >>>>>>>>>>> __Convert multiple templates with a single data file, e.g
> >> copying a
> >>>>>>>>>>> directory of configuration files using a JSON configuration
> >> file__
> >>>>>>>>>>>
> >>>>>>>>>>> ## Implementation notes
> >>>>>>>>>>> * When we copy a directory we can remove the `ftl`extension
> >>>>>>>>>>> on
> >> the
> >>>>>>>>>>> fly
> >>>>>>>>>>> * We might need an `exclude` filter for the copy operation
> >>>>>>>>>>> * Initially resolve to a list of template files and process
> >>>>>>>>>>> one
> >>>> after
> >>>>>>>>>>> another
> >>>>>>>>>>> * Need to calculate the output file location and extension
> >>>>>>>>>>> * We need to rename the existing command line parameters
> >>>>>>>>>>> (see
> >>>> below)
> >>>>>>>>>>> * Do we need multiple include and exclude filter?
> >>>>>>>>>>> * Do we need file versus directory filters?
> >>>>>>>>>>>
> >>>>>>>>>>> ### Command Line Options
> >>>>>>>>>>> ```
> >>>>>>>>>>> --input-encoding : Encoding of the documents
> >>>>>>>>>>> --output-encoding : Encoding of the rendered template
> >>>>>>>>>>> --template-encoding : Encoding of the template
> >>>>>>>>>>> --output : Output file or directory
> >>>>>>>>>>> --include-document : Include pattern for documents
> >>>>>>>>>>> --exclude-document : Exclude pattern for documents
> >>>>>>>>>>> --include-template: Include pattern for templates
> >>>>>>>>>>> --exclude-template : Exclude pattern for templates
> >>>>>>>>>>> ```
> >>>>>>>>>>>
> >>>>>>>>>>> ### Command Line Examples
> >>>>>>>>>>> ```text
> >>>>>>>>>>> # Copy all FTL templates found in "ext/config" to the
> >>>>>>>>>>> "/config"
> >>>>>>>>>>>
> >>>>>>>>>>> directory
> >>>>>>>>>>>
> >>>>>>>>>>> using the data from "config.json"
> >>>>>>>>>>>
> >>>>>>>>>>> freemarker-cli -t ./ext/config --include-template *.ftl --o
> >> /config
> >>>>>>>>>>>
> >>>>>>>>>>> config.json
> >>>>>>>>>>>
> >>>>>>>>>>> freemarker-cli --template ./ext/config --include-template
> >>>>>>>>>>> *.ftl
> >>>>>>>>>>>
> >>>>>>>>>>> --output
> >>>>>>>>>>>
> >>>>>>>>>>> /config config.json
> >>>>>>>>>>>
> >>>>>>>>>>> # Bascically the same using a named document "configuration"
> >>>>>>>>>>> # It might make sense to expose "conf" directly in the
> >>>>>>>>>>> FreeMarker
> >>>>>>>>>>> data
> >>>>>>>>>>> model
> >>>>>>>>>>> # It might make sens to allow URIs for loading documents
> >>>>>>>>>>>
> >>>>>>>>>>> freemarker-cli -t ./ext/config/*.ftl -o /config -d
> >>>>>>>>>>>
> >>>>>>>>>>> configuration=config.json
> >>>>>>>>>>>
> >>>>>>>>>>> freemarker-cli --template ./ext/config --include-template
> >>>>>>>>>>> *.ftl
> >>>>>>>>>>>
> >>>>>>>>>>> --output
> >>>>>>>>>>>
> >>>>>>>>>>> /config --document configuration=config.json
> >>>>>>>>>>>
> >>>>>>>>>>> freemarker-cli --template ./ext/config --include-template
> >>>>>>>>>>> *.ftl
> >>>>>>>>>>>
> >>>>>>>>>>> --output
> >>>>>>>>>>>
> >>>>>>>>>>> /config --document configuration=file:///config.json
> >>>>>>>>>>>
> >>>>>>>>>>> # Bascically the same using an environment variable as named
> >>>> document
> >>>>>>>>>>>
> >>>>>>>>>>> freemarker-cli -t ./ext/config --include-template *.ftl -o
> >> /config
> >>>> -d
> >>>>>>>>>>>
> >>>>>>>>>>> configuration=env:///CONFIGURATION
> >>>>>>>>>>>
> >>>>>>>>>>> freemarker-cli --template ./ext/config --include-template
> >>>>>>>>>>> *.ftl
> >>>>>>>>>>>
> >>>>>>>>>>> --output
> >>>>>>>>>>>
> >>>>>>>>>>> /config --document configuration=env:///CONFIGURATION
> >>>>>>>>>>> ```
> >>>>>>>>>>> === END
> >>>>>>>>>>>
> >>>>>>>>>>> On 23.02.2020, at 16:37, Daniel Dekany <dd...@apache.org>
> >> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Input documents is a fundamental concept in
> >>>>>>>>>>> freemarker-generator,
> >>>> so
> >>>>>>>>>>> we
> >>>>>>>>>>> should think about that more, and probably refine/rework how
> >>>>>>>>>>> it's
> >>>>>>>>>>> done.
> >>>>>>>>>>>
> >>>>>>>>>>> Currently it works like this, with CLI at least.
> >>>>>>>>>>>
> >>>>>>>>>>> freemarker-cli
> >>>>>>>>>>> -t access-report.ftl
> >>>>>>>>>>> somewhere/foo-access-log.csv
> >>>>>>>>>>>
> >>>>>>>>>>> Then in access-report.ftl you have to do something like
> >>>>>>>>>>> this:
> >>>>>>>>>>>
> >>>>>>>>>>> <#assign doc = Documents.get(0)>
> >>>>>>>>>>> ... process doc here
> >>>>>>>>>>>
> >>>>>>>>>>> (The more idiomatic Documents[0] won't work. Actually, that
> >>>>>>>>>>> lead
> >>>> to a
> >>>>>>>>>>>
> >>>>>>>>>>> funny
> >>>>>>>>>>>
> >>>>>>>>>>> chain of coincidences: It returned the string "D", then
> >>>>>>>>>>>
> >>>>>>>>>>> CSVTool.parse(...)
> >>>>>>>>>>>
> >>>>>>>>>>> happily parsed that to a table with the single column "D",
> >>>>>>>>>>> and 0
> >>>>>>>>>>> rows,
> >>>>>>>>>>>
> >>>>>>>>>>> and
> >>>>>>>>>>>
> >>>>>>>>>>> as there were 0 rows, the template didn't run into an error
> >> because
> >>>>>>>>>>> row.myExpectedColumn refers to a missing column either, so
> >>>>>>>>>>> the
> >>>>>>>>>>> process
> >>>>>>>>>>> finished with success. (: Pretty unlucky for sure. The root
> >>>>>>>>>>> was
> >>>>>>>>>>> unintentionally breaking a FreeMarker idiom though;
> >>>>>>>>>>> eventually we
> >>>>>>>>>>> will
> >>>>>>>>>>>
> >>>>>>>>>>> have
> >>>>>>>>>>>
> >>>>>>>>>>> to work on those too, but, different topic.)
> >>>>>>>>>>>
> >>>>>>>>>>> However, actually multiple input documents can be passed in:
> >>>>>>>>>>>
> >>>>>>>>>>> freemarker-cli
> >>>>>>>>>>> -t access-report.ftl
> >>>>>>>>>>> somewhere/foo-access-log.csv
> >>>>>>>>>>> somewhere/bar-access-log.csv
> >>>>>>>>>>>
> >>>>>>>>>>> Above template will still work, though then you ignored all
> >>>>>>>>>>> but
> >> the
> >>>>>>>>>>>
> >>>>>>>>>>> first
> >>>>>>>>>>>
> >>>>>>>>>>> document. So if you expect any number of input documents,
> >>>>>>>>>>> you
> >>>>>>>>>>> probably
> >>>>>>>>>>>
> >>>>>>>>>>> will
> >>>>>>>>>>>
> >>>>>>>>>>> have to do this:
> >>>>>>>>>>>
> >>>>>>>>>>> <#list Documents.list as doc>
> >>>>>>>>>>> ... process doc here
> >>>>>>>>>>> </#list>
> >>>>>>>>>>>
> >>>>>>>>>>> (The more idiomatic <#list Documents as doc> won't work; but
> >> again,
> >>>>>>>>>>>
> >>>>>>>>>>> those
> >>>>>>>>>>>
> >>>>>>>>>>> we will work out in a different thread.)
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> So, what would be better, in my opinion. I start out from
> >>>>>>>>>>> what I
> >>>>>>>>>>> think
> >>>>>>>>>>>
> >>>>>>>>>>> are
> >>>>>>>>>>>
> >>>>>>>>>>> the common uses cases, in decreasing order of frequency.
> >>>>>>>>>>> Goal is
> >> to
> >>>>>>>>>>>
> >>>>>>>>>>> make
> >>>>>>>>>>>
> >>>>>>>>>>> those less error prone for the users, and simpler to
> >>>>>>>>>>> express.
> >>>>>>>>>>>
> >>>>>>>>>>> USE CASE 1
> >>>>>>>>>>>
> >>>>>>>>>>> You have exactly 1 input documents, which is therefore
> >>>>>>>>>>> simply
> >> "the"
> >>>>>>>>>>> document in the mind of the user. This is probably the
> >>>>>>>>>>> typical
> >> use
> >>>>>>>>>>>
> >>>>>>>>>>> case,
> >>>>>>>>>>>
> >>>>>>>>>>> but at least the use case users typically start out from
> >>>>>>>>>>> when
> >>>>>>>>>>> starting
> >>>>>>>>>>>
> >>>>>>>>>>> the
> >>>>>>>>>>>
> >>>>>>>>>>> work.
> >>>>>>>>>>>
> >>>>>>>>>>> freemarker-cli
> >>>>>>>>>>> -t access-report.ftl
> >>>>>>>>>>> somewhere/foo-access-log.csv
> >>>>>>>>>>>
> >>>>>>>>>>> Then `Documents.get(0)` is not very fitting. Most
> >>>>>>>>>>> importantly
> >> it's
> >>>>>>>>>>>
> >>>>>>>>>>> error
> >>>>>>>>>>>
> >>>>>>>>>>> prone, because if the user passed in more than 1 documents
> >>>>>>>>>>> (can
> >>>> even
> >>>>>>>>>>>
> >>>>>>>>>>> happen
> >>>>>>>>>>>
> >>>>>>>>>>> totally accidentally, like if the user was lazy and used a
> >> wildcard
> >>>>>>>>>>>
> >>>>>>>>>>> that
> >>>>>>>>>>>
> >>>>>>>>>>> the shell exploded), the template will silently ignore the
> >>>>>>>>>>> rest
> >> of
> >>>>>>>>>>> the
> >>>>>>>>>>> documents, and the singe document processed will be
> >>>>>>>>>>> practically
> >>>>>>>>>>> picked
> >>>>>>>>>>> randomly. The user might won't notice that and submits a bad
> >> report
> >>>>>>>>>>> or
> >>>>>>>>>>>
> >>>>>>>>>>> such.
> >>>>>>>>>>>
> >>>>>>>>>>> I think that in this use case the document should be simply
> >>>> referred
> >>>>>>>>>>> as
> >>>>>>>>>>> `Document` in the template. When you have multiple documents
> >> there,
> >>>>>>>>>>> referring to `Document` should be an error, saying that the
> >>>> template
> >>>>>>>>>>>
> >>>>>>>>>>> was
> >>>>>>>>>>>
> >>>>>>>>>>> made to process a single document only.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> USE CASE 2
> >>>>>>>>>>>
> >>>>>>>>>>> You have multiple input documents, but each has different
> >>>>>>>>>>> role
> >>>>>>>>>>>
> >>>>>>>>>>> (different
> >>>>>>>>>>>
> >>>>>>>>>>> schema, maybe different file type). Like, you pass in
> >>>>>>>>>>> users.csv
> >> and
> >>>>>>>>>>> groups.csv. Each has difference schema, and so you want to
> >>>>>>>>>>> access
> >>>>>>>>>>> them
> >>>>>>>>>>> differently, but in the same template.
> >>>>>>>>>>>
> >>>>>>>>>>> freemarker-cli
> >>>>>>>>>>> [...]
> >>>>>>>>>>> --named-document users somewhere/foo-users.csv
> >>>>>>>>>>> --named-document groups somewhere/foo-groups.csv
> >>>>>>>>>>>
> >>>>>>>>>>> Then in the template you could refer to them as:
> >>>>>>>>>>>
> >>>>>>>>>>> `NamedDocuments.users`,
> >>>>>>>>>>>
> >>>>>>>>>>> and `NamedDocuments.groups`.
> >>>>>>>>>>>
> >>>>>>>>>>> Use Case 1, and 2 can be unified into a coherent concept,
> >>>>>>>>>>> where
> >>>>>>>>>>>
> >>>>>>>>>>> `Document`
> >>>>>>>>>>>
> >>>>>>>>>>> is just a shorthand for `NamedDocuments.main`. It's called
> >>>>>>>>>>> "main"
> >>>>>>>>>>>
> >>>>>>>>>>> because
> >>>>>>>>>>>
> >>>>>>>>>>> that's "the" document the template is about, but then you
> >>>>>>>>>>> have to
> >>>>>>>>>>> added
> >>>>>>>>>>> some helper documents, with symbolic names representing
> >>>>>>>>>>> their
> >> role.
> >>>>>>>>>>>
> >>>>>>>>>>> freemarker-cli
> >>>>>>>>>>> -t access-report.ftl
> >>>>>>>>>>> --document-name=main somewhere/foo-access-log.csv
> >>>>>>>>>>> --document-name=users somewhere/foo-users.csv
> >>>>>>>>>>> --document-name=groups somewhere/foo-groups.csv
> >>>>>>>>>>>
> >>>>>>>>>>> Here, `Document` still works in the template, and it refers
> >>>>>>>>>>> to
> >>>>>>>>>>> `somewhere/foo-access-log.csv`. (While omitting
> >>>> --document-name=main
> >>>>>>>>>>>
> >>>>>>>>>>> above
> >>>>>>>>>>>
> >>>>>>>>>>> would be cleaner, I couldn't figure out how to do that with
> >>>> Picocli.
> >>>>>>>>>>> Anyway, for now the point is the concept, which is not
> >>>>>>>>>>> specific
> >> to
> >>>>>>>>>>>
> >>>>>>>>>>> CLI.)
> >>>>>>>>>>>
> >>>>>>>>>>> USE CASE 3
> >>>>>>>>>>>
> >>>>>>>>>>> Here you have several of the same kind of documents. That
> >>>>>>>>>>> has a
> >>>> more
> >>>>>>>>>>> generic sub-use-case, when you have explicitly named
> >>>>>>>>>>> documents
> >>>> (like
> >>>>>>>>>>> "users" above), and for some you expect multiple input
> >>>>>>>>>>> files.
> >>>>>>>>>>>
> >>>>>>>>>>> freemarker-cli
> >>>>>>>>>>> -t access-report.ftl
> >>>>>>>>>>> --document-name=main somewhere/foo-access-log.csv
> >>>>>>>>>>> somewhere/bar-access-log.csv
> >>>>>>>>>>> --document-name=users somewhere/foo-users.csv
> >>>>>>>>>>> somewhere/bar-users.csv
> >>>>>>>>>>> --document-name=groups somewhere/global-groups.csv
> >>>>>>>>>>>
> >>>>>>>>>>> The template must to be written with this use case in mind,
> >>>>>>>>>>> as
> >> now
> >>>> it
> >>>>>>>>>>>
> >>>>>>>>>>> has
> >>>>>>>>>>>
> >>>>>>>>>>> #list some of the documents. (I think in practice you hardly
> >>>>>>>>>>> ever
> >>>>>>>>>>> want
> >>>>>>>>>>>
> >>>>>>>>>>> to
> >>>>>>>>>>>
> >>>>>>>>>>> get a document by hard coded index. Either you don't know
> >>>>>>>>>>> how
> >> many
> >>>>>>>>>>> documents you have, so you can't use hard coded indexes, or
> >>>>>>>>>>> you
> >> do,
> >>>>>>>>>>> and
> >>>>>>>>>>> each index has a specific meaning, but then you should name
> >>>>>>>>>>> the
> >>>>>>>>>>>
> >>>>>>>>>>> documents
> >>>>>>>>>>>
> >>>>>>>>>>> instead, as using indexes is error prone, and hard to read.)
> >>>>>>>>>>> Accessing that list of documents in the template, maybe
> >>>>>>>>>>> could be
> >>>> done
> >>>>>>>>>>>
> >>>>>>>>>>> like
> >>>>>>>>>>>
> >>>>>>>>>>> this:
> >>>>>>>>>>> - For the "main" documents: `DocumentList`
> >>>>>>>>>>> - For explicitly named documents, like "users":
> >>>>>>>>>>>
> >>>>>>>>>>> `NamedDocumentLists.users`
> >>>>>>>>>>>
> >>>>>>>>>>> SUMMING UP
> >>>>>>>>>>>
> >>>>>>>>>>> To unify all 3 use cases into a coherent concept:
> >>>>>>>>>>> - `NamedDocumentLists.<name>` is the most generic form, and
> >>>>>>>>>>> while
> >>>> you
> >>>>>>>>>>>
> >>>>>>>>>>> can
> >>>>>>>>>>>
> >>>>>>>>>>> achieve everything with it, using it requires your template
> >>>>>>>>>>> to
> >>>> handle
> >>>>>>>>>>>
> >>>>>>>>>>> the
> >>>>>>>>>>>
> >>>>>>>>>>> most generic case too. So, I think it would be rarely used.
> >>>>>>>>>>> - `DocumentList` is just a shorthand for
> >> `NamedDocumentLists.main`.
> >>>>>>>>>>>
> >>>>>>>>>>> It's
> >>>>>>>>>>>
> >>>>>>>>>>> used if you only have one kind of documents (single format
> >>>>>>>>>>> and
> >>>>>>>>>>> schema),
> >>>>>>>>>>>
> >>>>>>>>>>> but
> >>>>>>>>>>>
> >>>>>>>>>>> potentially multiple of them.
> >>>>>>>>>>> - `NamedDocuments.<name>` expresses that you expect exactly
> >>>>>>>>>>> 1
> >>>>>>>>>>> document
> >>>>>>>>>>>
> >>>>>>>>>>> of
> >>>>>>>>>>>
> >>>>>>>>>>> the given name.
> >>>>>>>>>>> - `Document` is just a shorthand for `NamedDocuments.main`.
> >>>>>>>>>>> This
> >> is
> >>>>>>>>>>> for
> >>>>>>>>>>>
> >>>>>>>>>>> the
> >>>>>>>>>>>
> >>>>>>>>>>> most natural/frequent use case.
> >>>>>>>>>>>
> >>>>>>>>>>> That's 4 possible ways of accessing your documents, which is
> >>>>>>>>>>> a
> >>>>>>>>>>>
> >>>>>>>>>>> trade-off
> >>>>>>>>>>>
> >>>>>>>>>>> for the sake of these:
> >>>>>>>>>>> - Catching CLI (or Maven, etc.) input where the template
> >>>>>>>>>>> output
> >>>>>>>>>>> likely
> >>>>>>>>>>>
> >>>>>>>>>>> will
> >>>>>>>>>>>
> >>>>>>>>>>> be wrong. That's only possible if the user can communicate
> >>>>>>>>>>> its
> >>>> intent
> >>>>>>>>>>>
> >>>>>>>>>>> in
> >>>>>>>>>>>
> >>>>>>>>>>> the template.
> >>>>>>>>>>> - Users don't need to deal with concepts that are irrelevant
> >>>>>>>>>>> in
> >>>> their
> >>>>>>>>>>> concrete use case. Just start with the trivial, `Document`,
> >>>>>>>>>>> and
> >>>> later
> >>>>>>>>>>>
> >>>>>>>>>>> if
> >>>>>>>>>>>
> >>>>>>>>>>> the need arises, generalize to named documents, document
> >>>>>>>>>>> lists,
> >> or
> >>>>>>>>>>>
> >>>>>>>>>>> both.
> >>>>>>>>>>>
> >>>>>>>>>>> What do guys think?
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Best regards,
> >>>>>>>> Daniel Dekany
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Best regards,
> >>>>>>> Daniel Dekany
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> --
> >>>>> Best regards,
> >>>>> Daniel Dekany
> >>>>
> >>>>
> >>>
> >>> --
> >>> Best regards,
> >>> Daniel Dekany
> >>
> >>
> >>
> >
> > --
> > Best regards,
> > Daniel Dekany
>


-- 
Best regards,
Daniel Dekany

Re: freemarker-generator: Improving the input documents concept

Posted by Siegfried Goeschl <si...@gmail.com>.

Hi Daniel,

The introduction of named `Datasource` allows to simplify / streamline a 
few things

* I have a meaningful user-supplied name
* I can pass additional configuration information as already implemented 
with `charset` and `contenttype` and this would also allow configure a 
`CSV Datasource`, e.g. 
`users=./data/users.csv#format=default&header=true&delimeter=TAB` which 
can be readily parses
* Currently the name of datasources are are taken from their relative 
file name - might make sense to drop that but I need to contemplate :-)

Regarding the "global mode" and "output generators files" - I'm sorry, 
but I'm not getting it

* I refined the 
https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449 to 
make my points more clearly
* Do you think of defining explicit "output generator file" containing 
`datasources, `templates` and `outputs` - yes that could be done but 
does not feel like an interactive command line tool any longer


Regarding "more idiomatic FTL usage"

* Yes, I need to dive into custom template models or whatever it is 
called :-)


Something we need to iron out is a release policy

* Currently we have little agreement how the CLI should look like or 
behave
* I think you are leaning towards a 1.0 release why I favour 0.x.y to 
have room to make mistakes / experiments
* I personally see the possibility that we don't get a release out - 
"perfect is the enemy of good"

How would you like to handle the problem - can we agree on minimal 
feature set worthy a release?

Thanks in advance,

Siegfried Goeschl


On 1 Mar 2020, at 11:33, Daniel Dekany wrote:

>>
>> Actually not recommended but we have named data sources for less than 
>> 24
>> hours
>
>
> Sorry, not sure what that means. Anyway, my "vote" is let's not give
> automatic names if that's not recommended to utilize. I mean, in case 
> we
> happen to agree on that, why leave it there. Especially if 
> automatically
> chosen names can clash with explicitly given ones, that would be a
> trouble.  (I'm not sure right now if they can... the path we use as 
> the
> name can be realtive? Then it realistically can.)
>
> This is a command line tool where we have little idea what the user 
> will do
>> or abuse
>
>
> No matter how much/little we know, we firmly put our bets by releasing
> something. So if some feature is certainly not right, that's enough to 
> not
> have it, I think.
>
> How does a "data loader" knows that it is responsible to load a file
>
> What should as "CSV data loader" should do - parse it into a list of
>> records or stream one by one?
>
>
> I think I was misunderstood here. It's not about some kind of 
> auto-magic.
> It's about where do you specify what to load and how, and in what 
> format do
> you specify that. Of course, you must specify the data source 
> (basically an
> URI for now as I saw), the rough format (CSV), and the format options
> (separator character, etc.), and other freemarker-generator loading 
> options
> (like which CSV columns are numbers, which are dates, with what 
> format,
> what counts as null, etc.).
>
> What was confusing in what I said much earlier is probably that you 
> don't
> need a global "--mode". That just means that you can have multiple 
> "modes"
> in the same run, not that you need some big auto-magic. And that they
> aren't really "modes" then... I think it's just natural that you can 
> have
> different kind of "output generator" files in the same run. Why force 
> the
> assumption that you don't, especially considering that they will might 
> want
> to access common data (which you don't want to load again and again, 
> for
> each run of the different --mode-s you need). Of course, as you might
> select files with wildcards (or by specifying a whole directory, or 
> with
> some Maven matcher), you just can't directly associate the data loader
> options to the individual data sources. Instead you can say elsewhere 
> that
> *.csv inside this explicit "group", or with this file name pattern, is 
> to
> be loaded like this. That's what you might perceived as auto-magic. 
> It's
> just mass-producing data loaders for "cattle" files.
>
> How to handle the case if you have multiple potential data loaders for 
> a
>> single file?
>
>
> As per above, that's just two data loaders referring to the same data
> source, so, nothing special.
>
> As of the current state of things, this is how I'm supposed to load a 
> CSV,
> in the template itself (if I'm not outdated/mistaken):
>
> <#assign cvsFormat = CSVTool.formats.DEFAULT.withHeader()>
> <#assign foos = CSVTool.parse(Datasources.get("foos"), 
> cvsFormat).records>
> <#assign bars = CSVTool.parse(Datasources.get("barb"), 
> cvsFormat).records>
>
> It will worth exploring how to make these look more "idiomatic" FTL 
> (given
> this is an "official" FM product now, I think, we should show how it's
> done), and nicer in general. Point for now is, that's basically two
> data-loaders interwoven with the template there. Because they are
> interwoven like that, you can't reuse what they loaded for another 
> template
> execution.
>
> That's comes down to personal preferences, e.g. chown uses 
> "owner[:group] "
>
>
> Yeah, but XML namespaces, Java, C, etc. all use 
> <parent><operator><child>,
> so, I think, that clicks for more of our potential users. So let's bet 
> on
> what clicks for more users.
>
> Besides, I challenged the very idea that we need both groups and 
> names. :)
> Saying that it's simpler and less opinioned (more flexible) to have 
> just
> multiple names (like tags). What's the end of that?
>
> On Sun, Mar 1, 2020 at 9:47 AM Siegfried Goeschl <
> siegfried.goeschl@gmail.com> wrote:
>
>> HI Daniel,
>>
>> Please see my comments below
>>
>> Thanks in advance,
>>
>> Siegfried Goeschl
>>
>>
>>> On 29.02.2020, at 21:02, Daniel Dekany <da...@gmail.com> 
>>> wrote:
>>>
>>>>
>>>> I try to provide a useful name even when the content is coming from 
>>>> an
>>>> URL
>>>
>>>
>>> When is it recommended to rely on that though? Because utilizing 
>>> that
>> means
>>> that renaming a data source file can break the process, even if you 
>>> call
>>> freemarker-cli with the up to date file name. And if that happens 
>>> depends
>>> on what you (or an other random colleague!) have dug inside the
>> templates.
>>> So I guess we better just don't support this. Less code and less 
>>> things
>> to
>>> document too.
>>>
>>
>> Actually not recommended but we have named data sources for less than 
>> 24
>> hours
>>
>>>
>>>> I think we have a different understanding what a "Document" /
>> "Datasource
>>>> / DataSource" should do
>>>
>>>
>>> Thing is, eventually (most certainly pre-1.0, as it influences
>>> architecture), certain needs will have to addressed, somehow. Then 
>>> we
>> will
>>> see what "things" we really need. For now I though we need "things" 
>>> that
>>> are much more than paths, and encapsulate the "how to load the data"
>>> aspect. I called them data sources, but maybe we should called them 
>>> "data
>>> loaders" to free up data sources for the more primitive thing. Some
>>> needs/doubts to address, *later*: Is it really the best approach for
>> users
>>> to load/parse data sources programmatically (that coded is written 
>>> in
>> FTL,
>>> inside the templates)? Also, is the template the right place for 
>>> doing
>>> that, because, when multiple templates (or just multiple template 
>>> *runs*
>> of
>>> the same template, each generating a different output file) needs 
>>> common
>>> data, they shouldn't load it again and again. Also, different topic, 
>>> can
>> we
>>> handle the case "transparently" enough when the data is not coming 
>>> from a
>>> file?
>>
>> This is a command line tool where we have little idea what the user 
>> will
>> do or abuse
>>
>> * How does a "data loader" knows that it is responsible to load a 
>> file
>> * What should as "CSV data loader" should do - parse it into a list 
>> of
>> records or stream one by one?
>> * How to handle the case if you have multiple potential data loaders 
>> for a
>> single file?
>>
>> I'm leaning towards building blocks where the user controls the work 
>> to be
>> done even it requires one to two extra lines of FTL code
>>
>>
>>>
>>> The joy of programming - I did not intend to use "name:group" 
>>> together
>> with
>>>> wildcards :-)
>>>
>>>
>>> For a CLI tool, I guess we agree that it should work. So maybe, like 
>>> this
>>> (here logs and foos meant to be "groups"):
>>> --data-source logs file1.log file2.log fileN.log   --data-source 
>>> foos
>>> foo1.csv foo2.csv fooN.csv  --data-source bar bar.xlsx
>>>
>>> It so happens that here you don't really have a good control about 
>>> the
>>> number of files associated to the name, so, maybe yet another reason 
>>> to
>> not
>>> differentiate names and groups.
>>>
>>> I Disagree here - I think using a name would be used more often. I 
>>> added
>>>> the "group" as an afterthought since some grouping could be useful
>>>
>>>
>>> We do agree in that. What I said is that the *syntax* should be so 
>>> that
>> the
>>> group comes first. It's still optional. Like this:
>>> --data-source group:name /somewhere
>>> --data-source name /somewhere
>>
>> That's comes down to personal preferences, e.g. chown uses 
>> "owner[:group] "
>>
>>>
>>> On Sat, Feb 29, 2020 at 7:34 PM Siegfried Goeschl <
>>> siegfried.goeschl@gmail.com> wrote:
>>>
>>>> HI Daniel,
>>>>
>>>> Seem my comments below
>>>>
>>>> Thanks in advance,
>>>>
>>>> Siegfried Goeschl
>>>>
>>>>
>>>>> On 29.02.2020, at 19:08, Daniel Dekany <da...@gmail.com>
>> wrote:
>>>>>
>>>>> FREEMARKER-135 freemarker-generator-cli: Support user-supplied 
>>>>> names
>> for
>>>>> datasources
>>>>>
>>>>> So, I can do this to have both a name an a group associated to a 
>>>>> data
>>>>> source:
>>>>> --datasource someName:someGroup=somewhere/something
>>>>
>>>> Correct
>>>>
>>>>> Or if I only want a name, but not a group (or an ""  group 
>>>>> actually -
>>>>> bug?), then:
>>>>> --datasource someName=somewhere/something
>>>>
>>>> Correct
>>>>
>>>>>
>>>>> Or if only a group but not a name (or a "" name actually) then:
>>>>> --datasource :someGroup=somewhere/something
>>>>
>>>> Mhmm, that would be unintended functionality from my side - current
>>>> approach is that every "Document" / "Datasource / DataSource" is 
>>>> named
>>>>
>>>>>
>>>>> A name must identify exactly 1 data source, while a group 
>>>>> identifies a
>>>> list
>>>>> of data sources.
>>>>
>>>> No, every "Document" / "Datasource / DataSource" has a name 
>>>> currently
>> but
>>>> uniqueness is not enforced. Only if you want to get a "Document" /
>>>> "Datasource / DataSource" with it's exact name I checked for 
>>>> exactly one
>>>> search hit and throw an exception. I try to provide a useful name 
>>>> even
>> when
>>>> the content is coming from an URL or STDIN (and I will probably add
>>>> environment variables as "Document" / "Datasource / DataSource", 
>>>> e.g
>>>> configuration in the cloud as JSON content passed as environment
>> variable)
>>>>
>>>>>
>>>>> Is that this idea, that the a data source can be part of a group, 
>>>>> and
>>>> then
>>>>> is also possibly identifiable with a name comes from an use case? 
>>>>> I
>> mean,
>>>>> it's possibly important somewhere, but if so, then it's strange 
>>>>> that
>> you
>>>>> can put something into only a single group. If we need this kind 
>>>>> of
>>>> thing,
>>>>> then perhaps you should be just allowed to associate the data 
>>>>> source
>>>> with a
>>>>> list of names (kind of like tagging), and then when the template 
>>>>> wants
>> to
>>>>> get something by name, it will tell there if it expects exactly 
>>>>> one or
>> a
>>>>> list of data sources. Then you don't need to introduce two terms 
>>>>> in the
>>>>> documentation either (names and groups). Again, if we want this at 
>>>>> all,
>>>>> instead of just going with a data source that itself gives a list. 
>>>>> (And
>>>> if
>>>>> not, how will we handle a data source that loads from a non-file
>> source?)
>>>>
>>>> I actually thought of implementing tagging but considered a "group"
>>>> sufficient.
>>>>
>>>> * If you don't define anything everything goes into the "default" 
>>>> group
>>>> * For individual documents you can define a name and an optional 
>>>> group
>>>>
>>>> I think we have a different understanding what a "Document" /
>> "Datasource
>>>> / DataSource" should do
>>>>
>>>> * It is a dumb
>>>> * It is lazy since data is only loaded on demand
>>>> * There is no automagic like "oh, this is a JSON file, so let's go 
>>>> to
>> the
>>>> JSON tool and create a map readily accessible in the data model"
>>>>
>>>>>
>>>>> Note that the current command line syntax doesn't work well with 
>>>>> shell
>>>>> wildcard expansion. Like this:
>>>>> --datasource :someGroup=logs/*.log
>>>>> will try to expand ":someGroup=logs/*.log", and because it finds
>> nothing
>>>>> (and because the rules of sh and the like is a mess), you will get 
>>>>> the
>>>>> parameter value as is, without * expanded.
>>>>
>>>> The joy of programming - I did not intend to use "name:group" 
>>>> together
>>>> with wildcards :-)
>>>>
>>>>>
>>>>> Also,  I think the syntax with colon should be flipped, because on
>> other
>>>>> places foo:bar usually means that foo is the bigger unit (the
>> container),
>>>>> and bar is the smaller unit (the child).
>>>>
>>>> I Disagree here - I think using a name would be used more often. I 
>>>> added
>>>> the "group" as an afterthought since some grouping could be useful
>>>>
>>>>>
>>>>> On Sat, Feb 29, 2020 at 5:03 PM Siegfried Goeschl <
>>>>> siegfried.goeschl@gmail.com> wrote:
>>>>>
>>>>>> Hi Daniel,
>>>>>>
>>>>>> I'm an enterprise developer - bad habits die hard :-)
>>>>>>
>>>>>> So I closed the following tickets and merged the branches
>>>>>>
>>>>>> 1) FREEMARKER-129 freemarker-generator: Merge "freemarker-cli" 
>>>>>> into
>>>>>> "freemarker-generator"
>>>>>> 2) FREEMARKER-134 freemarker-generator: Rename "Document" to
>>>> "Datasource"
>>>>>> 3) FREEMARKER-135 freemarker-generator-cli: Support user-supplied
>> names
>>>>>> for datasources
>>>>>>
>>>>>> Thanks in advance,
>>>>>>
>>>>>> Siegfried Goeschl
>>>>>>
>>>>>>
>>>>>>> On 29.02.2020, at 12:19, Daniel Dekany <da...@gmail.com>
>>>> wrote:
>>>>>>>
>>>>>>> Yeah, and of course, you can merge that branch. You can even 
>>>>>>> work on
>>>> the
>>>>>>> master directly after all.
>>>>>>>
>>>>>>> On Sat, Feb 29, 2020 at 12:17 PM Daniel Dekany <
>>>> daniel.dekany@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> But, I do recognize the cattle use case (several "faceless" 
>>>>>>>> files
>> with
>>>>>>>> common format/schema). Only, my idea is to push that complexity 
>>>>>>>> on
>> the
>>>>>> data
>>>>>>>> source. The "data source" concept shields the rest of the
>> application
>>>>>> from
>>>>>>>> the details of how the data is stored or retrieved. So, a data
>> source
>>>>>> might
>>>>>>>> loads a bunch of log files from a directory, and present them 
>>>>>>>> as a
>>>>>> single
>>>>>>>> big table, or like a list of tables, etc. So I want to deal 
>>>>>>>> with the
>>>>>> cattle
>>>>>>>> use case, but the question is what part of the of architecture 
>>>>>>>> will
>>>> deal
>>>>>>>> with this complication, with other words, how do you box 
>>>>>>>> things. Why
>>>> my
>>>>>>>> initial bet is to stuff that complication into the "data 
>>>>>>>> source"
>>>>>>>> implementation(s) is that data sources are inherently varied. 
>>>>>>>> Some
>>>>>> returns
>>>>>>>> a table-like thing, some have multiple named tables (worksheets 
>>>>>>>> in
>>>>>> Excel),
>>>>>>>> some returns tree of nodes (XML), etc. So then, some might 
>>>>>>>> returns a
>>>>>>>> list-of-list-of log records, or just a single list of 
>>>>>>>> log-records
>> (put
>>>>>>>> together from daily log files). That way cattles don't add to
>>>> conceptual
>>>>>>>> complexity. Now, you might be aware of cases where the cattle
>> concept
>>>>>> must
>>>>>>>> be more exposed than this, and the we can't box things like 
>>>>>>>> this.
>> But
>>>>>> this
>>>>>>>> is what I tried to express.
>>>>>>>>
>>>>>>>> Regarding "output generators", and how that applies on the 
>>>>>>>> command
>>>>>> line. I
>>>>>>>> think it's important that the common core between Maven and
>>>>>> command-line is
>>>>>>>> as fat as possible. Ideally, they are just two syntax to set up 
>>>>>>>> the
>>>> same
>>>>>>>> thing. Mostly at least. So, if you specify a template file to 
>>>>>>>> the
>> CLI
>>>>>>>> application, in a way so that it causes it to process that 
>>>>>>>> template
>> to
>>>>>>>> generate a single output, then there you have just defined an
>> "output
>>>>>>>> generator" (even if it wasn't explicitly called like that in 
>>>>>>>> the
>>>> command
>>>>>>>> line). If you specify 3 csv files to the CLI application, in a 
>>>>>>>> way
>> so
>>>>>> that
>>>>>>>> it causes it to generate 3 output files, then you have just 
>>>>>>>> defined
>> 3
>>>>>>>> "output generators" there (there's at least one template 
>>>>>>>> specified
>>>> there
>>>>>>>> too, but that wasn't an "output generator" itself, it was just 
>>>>>>>> an
>>>>>> attribute
>>>>>>>> of the 3 output generators). If you specify 1 template, and 3 
>>>>>>>> csv
>>>>>> files, in
>>>>>>>> a way so that it will yield 4 output files (1 for the template, 
>>>>>>>> 3
>> for
>>>>>> the
>>>>>>>> csv-s), then you have defined 4 output generators there. If you
>> have a
>>>>>> data
>>>>>>>> source that loads a list of 3 entities (say, 3 csv files, so 
>>>>>>>> it's a
>>>>>> list of
>>>>>>>> tables then), and you have 2 templates, and you tell the CLI to
>>>> execute
>>>>>>>> each template for each item in said data source, then you have 
>>>>>>>> just
>>>>>> defined
>>>>>>>> 6 "output generators".
>>>>>>>>
>>>>>>>> On Fri, Feb 28, 2020 at 11:08 AM Siegfried Goeschl <
>>>>>>>> siegfried.goeschl@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Daniel,
>>>>>>>>>
>>>>>>>>> That all depends on your mental model and work you do,
>> expectations,
>>>>>>>>> experience :-)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> __Document Handling__
>>>>>>>>>
>>>>>>>>> *"But I think actually we have no good use case for list of
>> documents
>>>>>>>>> that's passed at once to a single template run, so, we can 
>>>>>>>>> just
>>>> ignore
>>>>>>>>> that complication"*
>>>>>>>>>
>>>>>>>>> In my case that's not a complication but my daily business - 
>>>>>>>>> I'm
>>>>>>>>> regularly wading through access logs - yesterday probably a 
>>>>>>>>> couple
>> of
>>>>>>>>> hundreds access logs across two staging sites to help tracking 
>>>>>>>>> some
>>>>>>>>> strange API gateway issues :-)
>>>>>>>>>
>>>>>>>>> My gut feeling is (borrowing from
>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>>
>> https://medium.com/@Joachim8675309/devops-concepts-pets-vs-cattle-2380b5aab313
>>>>>>>>> )
>>>>>>>>>
>>>>>>>>> 1. You have a few lovely named documents / templates - `pets`
>>>>>>>>> 2. You have tons of anonymous documents / templates to process 
>>>>>>>>> -
>>>>>>>>> `cattle`
>>>>>>>>> 3. The "grey area" comes into play when mixing `pets & cattle`
>>>>>>>>>
>>>>>>>>> `freemarker-cli` was built with 2) in mind and I want to cover 
>>>>>>>>> 1)
>>>> since
>>>>>>>>> it is equally important and common.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> __Template And Document Processing Modes__
>>>>>>>>>
>>>>>>>>> IMHO it is important to answer the following question : "How 
>>>>>>>>> many
>>>>>>>>> outputs do you get when rendering 2 template and 3 
>>>>>>>>> datasources?
>> Two,
>>>>>>>>> Three or Six?"
>>>>>>>>>
>>>>>>>>> Your answer is influenced by your mental model / experience
>>>>>>>>>
>>>>>>>>> * When wading through tons of CSV files, access logs, etc. the
>> answer
>>>>>> is
>>>>>>>>> "2"
>>>>>>>>> * When doing source code generation the obvious answer is "6"
>>>>>>>>> * Can't image a use case which results in "3" but I'm pretty 
>>>>>>>>> sure
>> we
>>>>>>>>> will encounter one
>>>>>>>>>
>>>>>>>>> __Template and document mode probably shouldn't exist__
>>>>>>>>>
>>>>>>>>> That's hard for me to fully understand - I definitely lack 
>>>>>>>>> your
>>>>>> insights
>>>>>>>>> & experience writing such tools :-)
>>>>>>>>>
>>>>>>>>> Defining the `Output Generator` is the underlying model for 
>>>>>>>>> the
>> Maven
>>>>>>>>> plugin (and probably FMPP).
>>>>>>>>>
>>>>>>>>> I'm not sure if this applies for command lines at least not in 
>>>>>>>>> the
>>>> way
>>>>>> I
>>>>>>>>> use them (or would like to use them)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks in advance,
>>>>>>>>>
>>>>>>>>> Siegfried Goeschl
>>>>>>>>>
>>>>>>>>> PS: Can/shall I merge the PR to bring in `freemarker-cli`?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 28 Feb 2020, at 9:14, Daniel Dekany wrote:
>>>>>>>>>
>>>>>>>>>> Yeah, "data source" is surely a too popular name, but for 
>>>>>>>>>> reason.
>>>>>>>>>> Anyone
>>>>>>>>>> has other ideas?
>>>>>>>>>>
>>>>>>>>>> As of naming data sources and such. One thing I was wondering
>> about
>>>>>>>>>> back
>>>>>>>>>> then is how to deal with list of documents given to a 
>>>>>>>>>> template,
>>>> versus
>>>>>>>>>> exactly 1 document given to a template. But I think actually 
>>>>>>>>>> we
>> have
>>>>>>>>>> no
>>>>>>>>>> good use case for list of documents that's passed at once to 
>>>>>>>>>> a
>>>> single
>>>>>>>>>> template run, so, we can just ignore that complication. A 
>>>>>>>>>> document
>>>> has
>>>>>>>>>> a
>>>>>>>>>> name, and that's always just a single document, not a 
>>>>>>>>>> collection,
>> as
>>>>>>>>>> far as
>>>>>>>>>> the template is concerned. (We can have multiple documents 
>>>>>>>>>> per
>> run,
>>>>>>>>>> but
>>>>>>>>>> those normally yield separate output generators, so it's 
>>>>>>>>>> still
>> only
>>>>>>>>>> one
>>>>>>>>>> document per template.) However, we can have data source 
>>>>>>>>>> types
>>>>>>>>>> (document
>>>>>>>>>> types with old terminology) that collect together multiple 
>>>>>>>>>> data
>>>> files.
>>>>>>>>>> So
>>>>>>>>>> then that complexity is encapsulated into the data source 
>>>>>>>>>> type,
>> and
>>>>>>>>>> doesn't
>>>>>>>>>> complicate the overall architecture. That's another case when 
>>>>>>>>>> a
>> data
>>>>>>>>>> source
>>>>>>>>>> is not just a file. Like maybe there's a data source type 
>>>>>>>>>> that
>> loads
>>>>>>>>>> all
>>>>>>>>>> the CSV-s from a directory, into a single big table (I had 
>>>>>>>>>> such
>>>> case),
>>>>>>>>>> or
>>>>>>>>>> even into a list of tables. Or, as I mentioned already, a 
>>>>>>>>>> data
>>>> source
>>>>>>>>>> is
>>>>>>>>>> maybe an SQL query on a JDBC data source (and we got the 
>>>>>>>>>> first
>> term
>>>>>>>>>> clash... JDBC also call them data sources).
>>>>>>>>>>
>>>>>>>>>> Template and document mode probably shouldn't exist from user
>>>>>>>>>> perspective
>>>>>>>>>> either, at least not as a global option that must apply to
>>>> everything
>>>>>>>>>> in a
>>>>>>>>>> run. They could just give the files that define the "output
>>>>>>>>>> generators",
>>>>>>>>>> and some of them will be templates, some of them are data 
>>>>>>>>>> files,
>> in
>>>>>>>>>> which
>>>>>>>>>> case a template need to be associated with them (and there 
>>>>>>>>>> can be
>> a
>>>>>>>>>> couple
>>>>>>>>>> of ways of doing that). And then again, there are the cases 
>>>>>>>>>> where
>>>> you
>>>>>>>>>> want
>>>>>>>>>> to create one output generator per entity from some data 
>>>>>>>>>> source.
>>>>>>>>>>
>>>>>>>>>> On Fri, Feb 28, 2020 at 8:23 AM Siegfried Goeschl <
>>>>>>>>>> siegfried.goeschl@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>
>>>>>>>>>>> See my comments below - and thanks for your patience and 
>>>>>>>>>>> input
>> :-)
>>>>>>>>>>>
>>>>>>>>>>> *Renaming Document To DataSource*
>>>>>>>>>>>
>>>>>>>>>>> Yes, makes sense. I tried to avoid since I'm using
>> javax.activation
>>>>>>>>>>> and
>>>>>>>>>>> its DataSource.
>>>>>>>>>>>
>>>>>>>>>>> *Template And Document Mode*
>>>>>>>>>>>
>>>>>>>>>>> Agreed - I think it is a valuable abstraction for the user 
>>>>>>>>>>> but it
>>>> is
>>>>>>>>>>> not
>>>>>>>>>>> an implementation concept :-)
>>>>>>>>>>>
>>>>>>>>>>> *Document Without Symbolic Names*
>>>>>>>>>>>
>>>>>>>>>>> Also agreed and it is going to change but I have not settled 
>>>>>>>>>>> my
>>>> mind
>>>>>>>>>>> yet
>>>>>>>>>>> what exactly to implement.
>>>>>>>>>>>
>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>
>>>>>>>>>>> Siegfried Goeschl
>>>>>>>>>>>
>>>>>>>>>>> On 28 Feb 2020, at 1:05, Daniel Dekany wrote:
>>>>>>>>>>>
>>>>>>>>>>> A few quick thoughts on that:
>>>>>>>>>>>
>>>>>>>>>>> - We should replace the "document" term with something more
>>>> speaking.
>>>>>>>>>>> It
>>>>>>>>>>> doesn't tell that it's some kind of input. Also, most of 
>>>>>>>>>>> these
>>>> inputs
>>>>>>>>>>> aren't something that people typically call documents. Like 
>>>>>>>>>>> a csv
>>>>>>>>>>> file, or
>>>>>>>>>>> a database table, which is not even a file (OK we don't 
>>>>>>>>>>> support
>>>> such
>>>>>>>>>>> thing
>>>>>>>>>>> at the moment). I think, maybe "data source" is a safe 
>>>>>>>>>>> enough
>> term.
>>>>>>>>>>> (It
>>>>>>>>>>> also rhymes with data model.)
>>>>>>>>>>> - You have separate "template" and "document" "mode", that
>> applies
>>>> to
>>>>>>>>>>> a
>>>>>>>>>>> whole run. I think such specialization won't be helpful. We 
>>>>>>>>>>> could
>>>>>>>>>>> just say,
>>>>>>>>>>> on the conceptual level at lest, that we need a set of 
>>>>>>>>>>> "outputs
>>>>>>>>>>> generators". An output generator is an object (in the API) 
>>>>>>>>>>> that
>>>>>>>>>>> specifies a
>>>>>>>>>>> template, a data-model (where the data-model is possibly
>> populated
>>>>>>>>>>> with
>>>>>>>>>>> "documents"), and an output "sink" (a file path, or stdout), 
>>>>>>>>>>> and
>>>> can
>>>>>>>>>>> generate the output itself. A practical way of defining the
>> output
>>>>>>>>>>> generators in a CLI application is via a bunch of files, 
>>>>>>>>>>> each
>>>>>>>>>>> defining an
>>>>>>>>>>> output generator. Some of those files is maybe a template 
>>>>>>>>>>> (that
>> you
>>>>>>>>>>> can
>>>>>>>>>>> even detect from the file extension), or a data file that we
>>>>>>>>>>> currently call
>>>>>>>>>>> a "document". They could freely mix inside the same run. I 
>>>>>>>>>>> have
>>>> also
>>>>>>>>>>> met
>>>>>>>>>>> use case when you have a single table (single "document"), 
>>>>>>>>>>> and
>> each
>>>>>>>>>>> record
>>>>>>>>>>> in it yields an output file. That can also be described in 
>>>>>>>>>>> some
>>>> file
>>>>>>>>>>> format, or really in any other way, like directly in command 
>>>>>>>>>>> line
>>>>>>>>>>> argument,
>>>>>>>>>>> via API, etc.
>>>>>>>>>>> - You have multiple documents without associated symbolical 
>>>>>>>>>>> name
>> in
>>>>>>>>>>> some
>>>>>>>>>>> examples. Templates can't identify those then in a well
>>>> maintainable
>>>>>>>>>>> way.
>>>>>>>>>>> The actual file name is often not a good identifier, can 
>>>>>>>>>>> change
>>>> over
>>>>>>>>>>> time,
>>>>>>>>>>> and you might don't even have good control over it, like you
>>>> already
>>>>>>>>>>> receive it as a parameter from somewhere else, or someone
>>>>>>>>>>> moves/renames
>>>>>>>>>>> that files that you need to read. Index is also not very 
>>>>>>>>>>> good,
>> but
>>>> I
>>>>>>>>>>> have
>>>>>>>>>>> written about that earlier.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Feb 26, 2020 at 9:33 AM Siegfried Goeschl <
>>>>>>>>>>> siegfried.goeschl@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi folks,
>>>>>>>>>>>
>>>>>>>>>>> still wrapping my side around but assembled some thoughts 
>>>>>>>>>>> here -
>>>>>>>>>>>
>> https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449
>>>>>>>>>>>
>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>
>>>>>>>>>>> Siegfried Goeschl
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 23 Feb 2020, at 23:14, Daniel Dekany <dd...@apache.org>
>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> What you are describing is more like the angle that FMPP 
>>>>>>>>>>> took
>>>>>>>>>>> initially,
>>>>>>>>>>> where templates drive things, they generate the output for
>>>> themselves
>>>>>>>>>>>
>>>>>>>>>>> (even
>>>>>>>>>>>
>>>>>>>>>>> multiple output files if they wish). By default output files 
>>>>>>>>>>> name
>>>>>>>>>>> (and
>>>>>>>>>>> relative path) is deduced from template name. There was also 
>>>>>>>>>>> a
>>>> global
>>>>>>>>>>> data-model, built in a configuration file (or equally, built 
>>>>>>>>>>> via
>>>>>>>>>>> command
>>>>>>>>>>> line arguments, or both mixed), from which templates get 
>>>>>>>>>>> whatever
>>>>>>>>>>> data
>>>>>>>>>>>
>>>>>>>>>>> they
>>>>>>>>>>>
>>>>>>>>>>> are interested in. Take a look at the figures here:
>>>>>>>>>>> http://fmpp.sourceforge.net/qtour.html. Later, this concept 
>>>>>>>>>>> was
>>>>>>>>>>>
>>>>>>>>>>> generalized
>>>>>>>>>>>
>>>>>>>>>>> a bit more, because you could add XML files at the same 
>>>>>>>>>>> place
>> where
>>>>>>>>>>> you
>>>>>>>>>>> have the templates, and then you could associate transform
>>>> templates
>>>>>>>>>>> to
>>>>>>>>>>>
>>>>>>>>>>> the
>>>>>>>>>>>
>>>>>>>>>>> XML files (based on path pattern and/or the XML document
>> element).
>>>>>>>>>>> Now
>>>>>>>>>>> that's like what freemarker-generator had initially (data 
>>>>>>>>>>> files
>>>> drive
>>>>>>>>>>> output, and the template is there to transform it).
>>>>>>>>>>>
>>>>>>>>>>> So I think the generic mental model would like this:
>>>>>>>>>>>
>>>>>>>>>>> 1. You got files that drive the process, let's call them
>> *generator
>>>>>>>>>>> files* for now. Usually, each generator file yields an 
>>>>>>>>>>> output
>> file
>>>>>>>>>>> (but
>>>>>>>>>>> maybe even multiple output files, as you might saw in the 
>>>>>>>>>>> last
>>>>>>>>>>> figure).
>>>>>>>>>>> These generator files can be of many types, like XML, JSON, 
>>>>>>>>>>> XLSX
>>>> (as
>>>>>>>>>>>
>>>>>>>>>>> in the
>>>>>>>>>>>
>>>>>>>>>>> original freemarker-generator), and even templates (as is 
>>>>>>>>>>> the
>> norm
>>>> in
>>>>>>>>>>> FMPP). If the file is not a template, then you got a set of
>>>>>>>>>>> transformer
>>>>>>>>>>> templates (-t CLI option) in a separate directory, which can 
>>>>>>>>>>> be
>>>>>>>>>>>
>>>>>>>>>>> associated
>>>>>>>>>>>
>>>>>>>>>>> with the generator files base on name patterns, and even 
>>>>>>>>>>> based on
>>>>>>>>>>>
>>>>>>>>>>> content
>>>>>>>>>>>
>>>>>>>>>>> (schema usually). If the generator file is a template (so 
>>>>>>>>>>> that's
>> a
>>>>>>>>>>> positional @Parameter CLI argument that happens to be an 
>>>>>>>>>>> *.ftl,
>> and
>>>>>>>>>>> is
>>>>>>>>>>>
>>>>>>>>>>> not
>>>>>>>>>>>
>>>>>>>>>>> a template file specified after the "-t" option), then you 
>>>>>>>>>>> just
>>>>>>>>>>> Template.process(...) it, and it prints what the output will 
>>>>>>>>>>> be.
>>>>>>>>>>> 2. You also have a set of variables, the global data-model, 
>>>>>>>>>>> that
>>>>>>>>>>> contains commonly useful stuff, like what you now call 
>>>>>>>>>>> parameters
>>>>>>>>>>> (CLI
>>>>>>>>>>> -Pname=value), but also maybe data loaded from JSON, XML, 
>>>>>>>>>>> etc..
>>>> Those
>>>>>>>>>>>
>>>>>>>>>>> data
>>>>>>>>>>>
>>>>>>>>>>> files aren't "generator files". Templates just use them if 
>>>>>>>>>>> they
>>>> need
>>>>>>>>>>>
>>>>>>>>>>> them.
>>>>>>>>>>>
>>>>>>>>>>> An important thing here is to reuse the same mechanism to 
>>>>>>>>>>> read
>> and
>>>>>>>>>>>
>>>>>>>>>>> parse
>>>>>>>>>>>
>>>>>>>>>>> those data files, which was used in templates when 
>>>>>>>>>>> transforming
>>>>>>>>>>>
>>>>>>>>>>> generator
>>>>>>>>>>>
>>>>>>>>>>> files. So we need a common format for specifying how to load 
>>>>>>>>>>> data
>>>>>>>>>>>
>>>>>>>>>>> files.
>>>>>>>>>>>
>>>>>>>>>>> That's maybe just FTL that #assigns to the variables, or 
>>>>>>>>>>> maybe
>> more
>>>>>>>>>>> declarative format.
>>>>>>>>>>>
>>>>>>>>>>> What I have described in the original post here was a less
>> generic
>>>>>>>>>>> form
>>>>>>>>>>>
>>>>>>>>>>> of
>>>>>>>>>>>
>>>>>>>>>>> this, as I tried to be true with the original approach. I 
>>>>>>>>>>> though
>>>> the
>>>>>>>>>>> proposal will be drastic enough as it is... :) There, the 
>>>>>>>>>>> "main"
>>>>>>>>>>> document
>>>>>>>>>>> is the "generator file" from point 1, the "-t" template is 
>>>>>>>>>>> the
>>>>>>>>>>> transform
>>>>>>>>>>> template for the "main" document, and the other named 
>>>>>>>>>>> documents
>>>>>>>>>>> ("users",
>>>>>>>>>>> "groups") is a poor man's shared data-model from point 2
>> (together
>>>>>>>>>>> with
>>>>>>>>>>> with -PName=value).
>>>>>>>>>>>
>>>>>>>>>>> There's further somewhat confusing thing to get right with 
>>>>>>>>>>> the
>>>>>>>>>>> list-of-documents (`DocuentList`, `NamedDocumentLists`) 
>>>>>>>>>>> thing
>>>> though.
>>>>>>>>>>> In
>>>>>>>>>>> the model above, as per point 1, if you list multiple data 
>>>>>>>>>>> files,
>>>>>>>>>>> each
>>>>>>>>>>>
>>>>>>>>>>> will
>>>>>>>>>>>
>>>>>>>>>>> generate a separate output file. So, if you need take in a 
>>>>>>>>>>> list
>> of
>>>>>>>>>>> files
>>>>>>>>>>>
>>>>>>>>>>> to
>>>>>>>>>>>
>>>>>>>>>>> transform it to a single output file (or at least with a 
>>>>>>>>>>> single
>>>>>>>>>>> transform
>>>>>>>>>>> template execution), then you have to be explicit about 
>>>>>>>>>>> that, as
>>>>>>>>>>> that's
>>>>>>>>>>>
>>>>>>>>>>> not
>>>>>>>>>>>
>>>>>>>>>>> the default behavior anymore. But it's still absolutely 
>>>>>>>>>>> possible.
>>>>>>>>>>> Imagine
>>>>>>>>>>> it as a "list of XLSX-es" is itself like a file format. You 
>>>>>>>>>>> need
>>>> some
>>>>>>>>>>> CLI
>>>>>>>>>>> (and Maven config, etc.) syntax to express that, but that
>> shouldn't
>>>>>>>>>>> be a
>>>>>>>>>>> big deal.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Feb 23, 2020 at 9:43 PM Siegfried Goeschl <
>>>>>>>>>>> siegfried.goeschl@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>
>>>>>>>>>>> Good timing - I was looking at a similar problem from 
>>>>>>>>>>> different
>>>> angle
>>>>>>>>>>> yesterday (see below)
>>>>>>>>>>>
>>>>>>>>>>> Don't have enough time to answer your email in detail now - 
>>>>>>>>>>> will
>> do
>>>>>>>>>>> that
>>>>>>>>>>> tomorrow evening
>>>>>>>>>>>
>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>
>>>>>>>>>>> Siegfried Goeschl
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ===. START
>>>>>>>>>>> # FreeMarker CLI Improvement
>>>>>>>>>>> ## Support Of Multiple Template Files
>>>>>>>>>>> Currently we support the following combinations
>>>>>>>>>>>
>>>>>>>>>>> * Single template and no data files
>>>>>>>>>>> * Single template and one or more data files
>>>>>>>>>>>
>>>>>>>>>>> But we can not support the following use case which is quite
>>>> typical
>>>>>>>>>>> in
>>>>>>>>>>> the cloud
>>>>>>>>>>>
>>>>>>>>>>> __Convert multiple templates with a single data file, e.g
>> copying a
>>>>>>>>>>> directory of configuration files using a JSON configuration
>> file__
>>>>>>>>>>>
>>>>>>>>>>> ## Implementation notes
>>>>>>>>>>> * When we copy a directory we can remove the `ftl`extension 
>>>>>>>>>>> on
>> the
>>>>>>>>>>> fly
>>>>>>>>>>> * We might need an `exclude` filter for the copy operation
>>>>>>>>>>> * Initially resolve to a list of template files and process 
>>>>>>>>>>> one
>>>> after
>>>>>>>>>>> another
>>>>>>>>>>> * Need to calculate the output file location and extension
>>>>>>>>>>> * We need to rename the existing command line parameters 
>>>>>>>>>>> (see
>>>> below)
>>>>>>>>>>> * Do we need multiple include and exclude filter?
>>>>>>>>>>> * Do we need file versus directory filters?
>>>>>>>>>>>
>>>>>>>>>>> ### Command Line Options
>>>>>>>>>>> ```
>>>>>>>>>>> --input-encoding : Encoding of the documents
>>>>>>>>>>> --output-encoding : Encoding of the rendered template
>>>>>>>>>>> --template-encoding : Encoding of the template
>>>>>>>>>>> --output : Output file or directory
>>>>>>>>>>> --include-document : Include pattern for documents
>>>>>>>>>>> --exclude-document : Exclude pattern for documents
>>>>>>>>>>> --include-template: Include pattern for templates
>>>>>>>>>>> --exclude-template : Exclude pattern for templates
>>>>>>>>>>> ```
>>>>>>>>>>>
>>>>>>>>>>> ### Command Line Examples
>>>>>>>>>>> ```text
>>>>>>>>>>> # Copy all FTL templates found in "ext/config" to the 
>>>>>>>>>>> "/config"
>>>>>>>>>>>
>>>>>>>>>>> directory
>>>>>>>>>>>
>>>>>>>>>>> using the data from "config.json"
>>>>>>>>>>>
>>>>>>>>>>> freemarker-cli -t ./ext/config --include-template *.ftl --o
>> /config
>>>>>>>>>>>
>>>>>>>>>>> config.json
>>>>>>>>>>>
>>>>>>>>>>> freemarker-cli --template ./ext/config --include-template 
>>>>>>>>>>> *.ftl
>>>>>>>>>>>
>>>>>>>>>>> --output
>>>>>>>>>>>
>>>>>>>>>>> /config config.json
>>>>>>>>>>>
>>>>>>>>>>> # Bascically the same using a named document "configuration"
>>>>>>>>>>> # It might make sense to expose "conf" directly in the 
>>>>>>>>>>> FreeMarker
>>>>>>>>>>> data
>>>>>>>>>>> model
>>>>>>>>>>> # It might make sens to allow URIs for loading documents
>>>>>>>>>>>
>>>>>>>>>>> freemarker-cli -t ./ext/config/*.ftl -o /config -d
>>>>>>>>>>>
>>>>>>>>>>> configuration=config.json
>>>>>>>>>>>
>>>>>>>>>>> freemarker-cli --template ./ext/config --include-template 
>>>>>>>>>>> *.ftl
>>>>>>>>>>>
>>>>>>>>>>> --output
>>>>>>>>>>>
>>>>>>>>>>> /config --document configuration=config.json
>>>>>>>>>>>
>>>>>>>>>>> freemarker-cli --template ./ext/config --include-template 
>>>>>>>>>>> *.ftl
>>>>>>>>>>>
>>>>>>>>>>> --output
>>>>>>>>>>>
>>>>>>>>>>> /config --document configuration=file:///config.json
>>>>>>>>>>>
>>>>>>>>>>> # Bascically the same using an environment variable as named
>>>> document
>>>>>>>>>>>
>>>>>>>>>>> freemarker-cli -t ./ext/config --include-template *.ftl -o
>> /config
>>>> -d
>>>>>>>>>>>
>>>>>>>>>>> configuration=env:///CONFIGURATION
>>>>>>>>>>>
>>>>>>>>>>> freemarker-cli --template ./ext/config --include-template 
>>>>>>>>>>> *.ftl
>>>>>>>>>>>
>>>>>>>>>>> --output
>>>>>>>>>>>
>>>>>>>>>>> /config --document configuration=env:///CONFIGURATION
>>>>>>>>>>> ```
>>>>>>>>>>> === END
>>>>>>>>>>>
>>>>>>>>>>> On 23.02.2020, at 16:37, Daniel Dekany <dd...@apache.org>
>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Input documents is a fundamental concept in 
>>>>>>>>>>> freemarker-generator,
>>>> so
>>>>>>>>>>> we
>>>>>>>>>>> should think about that more, and probably refine/rework how 
>>>>>>>>>>> it's
>>>>>>>>>>> done.
>>>>>>>>>>>
>>>>>>>>>>> Currently it works like this, with CLI at least.
>>>>>>>>>>>
>>>>>>>>>>> freemarker-cli
>>>>>>>>>>> -t access-report.ftl
>>>>>>>>>>> somewhere/foo-access-log.csv
>>>>>>>>>>>
>>>>>>>>>>> Then in access-report.ftl you have to do something like 
>>>>>>>>>>> this:
>>>>>>>>>>>
>>>>>>>>>>> <#assign doc = Documents.get(0)>
>>>>>>>>>>> ... process doc here
>>>>>>>>>>>
>>>>>>>>>>> (The more idiomatic Documents[0] won't work. Actually, that 
>>>>>>>>>>> lead
>>>> to a
>>>>>>>>>>>
>>>>>>>>>>> funny
>>>>>>>>>>>
>>>>>>>>>>> chain of coincidences: It returned the string "D", then
>>>>>>>>>>>
>>>>>>>>>>> CSVTool.parse(...)
>>>>>>>>>>>
>>>>>>>>>>> happily parsed that to a table with the single column "D", 
>>>>>>>>>>> and 0
>>>>>>>>>>> rows,
>>>>>>>>>>>
>>>>>>>>>>> and
>>>>>>>>>>>
>>>>>>>>>>> as there were 0 rows, the template didn't run into an error
>> because
>>>>>>>>>>> row.myExpectedColumn refers to a missing column either, so 
>>>>>>>>>>> the
>>>>>>>>>>> process
>>>>>>>>>>> finished with success. (: Pretty unlucky for sure. The root 
>>>>>>>>>>> was
>>>>>>>>>>> unintentionally breaking a FreeMarker idiom though; 
>>>>>>>>>>> eventually we
>>>>>>>>>>> will
>>>>>>>>>>>
>>>>>>>>>>> have
>>>>>>>>>>>
>>>>>>>>>>> to work on those too, but, different topic.)
>>>>>>>>>>>
>>>>>>>>>>> However, actually multiple input documents can be passed in:
>>>>>>>>>>>
>>>>>>>>>>> freemarker-cli
>>>>>>>>>>> -t access-report.ftl
>>>>>>>>>>> somewhere/foo-access-log.csv
>>>>>>>>>>> somewhere/bar-access-log.csv
>>>>>>>>>>>
>>>>>>>>>>> Above template will still work, though then you ignored all 
>>>>>>>>>>> but
>> the
>>>>>>>>>>>
>>>>>>>>>>> first
>>>>>>>>>>>
>>>>>>>>>>> document. So if you expect any number of input documents, 
>>>>>>>>>>> you
>>>>>>>>>>> probably
>>>>>>>>>>>
>>>>>>>>>>> will
>>>>>>>>>>>
>>>>>>>>>>> have to do this:
>>>>>>>>>>>
>>>>>>>>>>> <#list Documents.list as doc>
>>>>>>>>>>> ... process doc here
>>>>>>>>>>> </#list>
>>>>>>>>>>>
>>>>>>>>>>> (The more idiomatic <#list Documents as doc> won't work; but
>> again,
>>>>>>>>>>>
>>>>>>>>>>> those
>>>>>>>>>>>
>>>>>>>>>>> we will work out in a different thread.)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> So, what would be better, in my opinion. I start out from 
>>>>>>>>>>> what I
>>>>>>>>>>> think
>>>>>>>>>>>
>>>>>>>>>>> are
>>>>>>>>>>>
>>>>>>>>>>> the common uses cases, in decreasing order of frequency. 
>>>>>>>>>>> Goal is
>> to
>>>>>>>>>>>
>>>>>>>>>>> make
>>>>>>>>>>>
>>>>>>>>>>> those less error prone for the users, and simpler to 
>>>>>>>>>>> express.
>>>>>>>>>>>
>>>>>>>>>>> USE CASE 1
>>>>>>>>>>>
>>>>>>>>>>> You have exactly 1 input documents, which is therefore 
>>>>>>>>>>> simply
>> "the"
>>>>>>>>>>> document in the mind of the user. This is probably the 
>>>>>>>>>>> typical
>> use
>>>>>>>>>>>
>>>>>>>>>>> case,
>>>>>>>>>>>
>>>>>>>>>>> but at least the use case users typically start out from 
>>>>>>>>>>> when
>>>>>>>>>>> starting
>>>>>>>>>>>
>>>>>>>>>>> the
>>>>>>>>>>>
>>>>>>>>>>> work.
>>>>>>>>>>>
>>>>>>>>>>> freemarker-cli
>>>>>>>>>>> -t access-report.ftl
>>>>>>>>>>> somewhere/foo-access-log.csv
>>>>>>>>>>>
>>>>>>>>>>> Then `Documents.get(0)` is not very fitting. Most 
>>>>>>>>>>> importantly
>> it's
>>>>>>>>>>>
>>>>>>>>>>> error
>>>>>>>>>>>
>>>>>>>>>>> prone, because if the user passed in more than 1 documents 
>>>>>>>>>>> (can
>>>> even
>>>>>>>>>>>
>>>>>>>>>>> happen
>>>>>>>>>>>
>>>>>>>>>>> totally accidentally, like if the user was lazy and used a
>> wildcard
>>>>>>>>>>>
>>>>>>>>>>> that
>>>>>>>>>>>
>>>>>>>>>>> the shell exploded), the template will silently ignore the 
>>>>>>>>>>> rest
>> of
>>>>>>>>>>> the
>>>>>>>>>>> documents, and the singe document processed will be 
>>>>>>>>>>> practically
>>>>>>>>>>> picked
>>>>>>>>>>> randomly. The user might won't notice that and submits a bad
>> report
>>>>>>>>>>> or
>>>>>>>>>>>
>>>>>>>>>>> such.
>>>>>>>>>>>
>>>>>>>>>>> I think that in this use case the document should be simply
>>>> referred
>>>>>>>>>>> as
>>>>>>>>>>> `Document` in the template. When you have multiple documents
>> there,
>>>>>>>>>>> referring to `Document` should be an error, saying that the
>>>> template
>>>>>>>>>>>
>>>>>>>>>>> was
>>>>>>>>>>>
>>>>>>>>>>> made to process a single document only.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> USE CASE 2
>>>>>>>>>>>
>>>>>>>>>>> You have multiple input documents, but each has different 
>>>>>>>>>>> role
>>>>>>>>>>>
>>>>>>>>>>> (different
>>>>>>>>>>>
>>>>>>>>>>> schema, maybe different file type). Like, you pass in 
>>>>>>>>>>> users.csv
>> and
>>>>>>>>>>> groups.csv. Each has difference schema, and so you want to 
>>>>>>>>>>> access
>>>>>>>>>>> them
>>>>>>>>>>> differently, but in the same template.
>>>>>>>>>>>
>>>>>>>>>>> freemarker-cli
>>>>>>>>>>> [...]
>>>>>>>>>>> --named-document users somewhere/foo-users.csv
>>>>>>>>>>> --named-document groups somewhere/foo-groups.csv
>>>>>>>>>>>
>>>>>>>>>>> Then in the template you could refer to them as:
>>>>>>>>>>>
>>>>>>>>>>> `NamedDocuments.users`,
>>>>>>>>>>>
>>>>>>>>>>> and `NamedDocuments.groups`.
>>>>>>>>>>>
>>>>>>>>>>> Use Case 1, and 2 can be unified into a coherent concept, 
>>>>>>>>>>> where
>>>>>>>>>>>
>>>>>>>>>>> `Document`
>>>>>>>>>>>
>>>>>>>>>>> is just a shorthand for `NamedDocuments.main`. It's called 
>>>>>>>>>>> "main"
>>>>>>>>>>>
>>>>>>>>>>> because
>>>>>>>>>>>
>>>>>>>>>>> that's "the" document the template is about, but then you 
>>>>>>>>>>> have to
>>>>>>>>>>> added
>>>>>>>>>>> some helper documents, with symbolic names representing 
>>>>>>>>>>> their
>> role.
>>>>>>>>>>>
>>>>>>>>>>> freemarker-cli
>>>>>>>>>>> -t access-report.ftl
>>>>>>>>>>> --document-name=main somewhere/foo-access-log.csv
>>>>>>>>>>> --document-name=users somewhere/foo-users.csv
>>>>>>>>>>> --document-name=groups somewhere/foo-groups.csv
>>>>>>>>>>>
>>>>>>>>>>> Here, `Document` still works in the template, and it refers 
>>>>>>>>>>> to
>>>>>>>>>>> `somewhere/foo-access-log.csv`. (While omitting
>>>> --document-name=main
>>>>>>>>>>>
>>>>>>>>>>> above
>>>>>>>>>>>
>>>>>>>>>>> would be cleaner, I couldn't figure out how to do that with
>>>> Picocli.
>>>>>>>>>>> Anyway, for now the point is the concept, which is not 
>>>>>>>>>>> specific
>> to
>>>>>>>>>>>
>>>>>>>>>>> CLI.)
>>>>>>>>>>>
>>>>>>>>>>> USE CASE 3
>>>>>>>>>>>
>>>>>>>>>>> Here you have several of the same kind of documents. That 
>>>>>>>>>>> has a
>>>> more
>>>>>>>>>>> generic sub-use-case, when you have explicitly named 
>>>>>>>>>>> documents
>>>> (like
>>>>>>>>>>> "users" above), and for some you expect multiple input 
>>>>>>>>>>> files.
>>>>>>>>>>>
>>>>>>>>>>> freemarker-cli
>>>>>>>>>>> -t access-report.ftl
>>>>>>>>>>> --document-name=main somewhere/foo-access-log.csv
>>>>>>>>>>> somewhere/bar-access-log.csv
>>>>>>>>>>> --document-name=users somewhere/foo-users.csv
>>>>>>>>>>> somewhere/bar-users.csv
>>>>>>>>>>> --document-name=groups somewhere/global-groups.csv
>>>>>>>>>>>
>>>>>>>>>>> The template must to be written with this use case in mind, 
>>>>>>>>>>> as
>> now
>>>> it
>>>>>>>>>>>
>>>>>>>>>>> has
>>>>>>>>>>>
>>>>>>>>>>> #list some of the documents. (I think in practice you hardly 
>>>>>>>>>>> ever
>>>>>>>>>>> want
>>>>>>>>>>>
>>>>>>>>>>> to
>>>>>>>>>>>
>>>>>>>>>>> get a document by hard coded index. Either you don't know 
>>>>>>>>>>> how
>> many
>>>>>>>>>>> documents you have, so you can't use hard coded indexes, or 
>>>>>>>>>>> you
>> do,
>>>>>>>>>>> and
>>>>>>>>>>> each index has a specific meaning, but then you should name 
>>>>>>>>>>> the
>>>>>>>>>>>
>>>>>>>>>>> documents
>>>>>>>>>>>
>>>>>>>>>>> instead, as using indexes is error prone, and hard to read.)
>>>>>>>>>>> Accessing that list of documents in the template, maybe 
>>>>>>>>>>> could be
>>>> done
>>>>>>>>>>>
>>>>>>>>>>> like
>>>>>>>>>>>
>>>>>>>>>>> this:
>>>>>>>>>>> - For the "main" documents: `DocumentList`
>>>>>>>>>>> - For explicitly named documents, like "users":
>>>>>>>>>>>
>>>>>>>>>>> `NamedDocumentLists.users`
>>>>>>>>>>>
>>>>>>>>>>> SUMMING UP
>>>>>>>>>>>
>>>>>>>>>>> To unify all 3 use cases into a coherent concept:
>>>>>>>>>>> - `NamedDocumentLists.<name>` is the most generic form, and 
>>>>>>>>>>> while
>>>> you
>>>>>>>>>>>
>>>>>>>>>>> can
>>>>>>>>>>>
>>>>>>>>>>> achieve everything with it, using it requires your template 
>>>>>>>>>>> to
>>>> handle
>>>>>>>>>>>
>>>>>>>>>>> the
>>>>>>>>>>>
>>>>>>>>>>> most generic case too. So, I think it would be rarely used.
>>>>>>>>>>> - `DocumentList` is just a shorthand for
>> `NamedDocumentLists.main`.
>>>>>>>>>>>
>>>>>>>>>>> It's
>>>>>>>>>>>
>>>>>>>>>>> used if you only have one kind of documents (single format 
>>>>>>>>>>> and
>>>>>>>>>>> schema),
>>>>>>>>>>>
>>>>>>>>>>> but
>>>>>>>>>>>
>>>>>>>>>>> potentially multiple of them.
>>>>>>>>>>> - `NamedDocuments.<name>` expresses that you expect exactly 
>>>>>>>>>>> 1
>>>>>>>>>>> document
>>>>>>>>>>>
>>>>>>>>>>> of
>>>>>>>>>>>
>>>>>>>>>>> the given name.
>>>>>>>>>>> - `Document` is just a shorthand for `NamedDocuments.main`. 
>>>>>>>>>>> This
>> is
>>>>>>>>>>> for
>>>>>>>>>>>
>>>>>>>>>>> the
>>>>>>>>>>>
>>>>>>>>>>> most natural/frequent use case.
>>>>>>>>>>>
>>>>>>>>>>> That's 4 possible ways of accessing your documents, which is 
>>>>>>>>>>> a
>>>>>>>>>>>
>>>>>>>>>>> trade-off
>>>>>>>>>>>
>>>>>>>>>>> for the sake of these:
>>>>>>>>>>> - Catching CLI (or Maven, etc.) input where the template 
>>>>>>>>>>> output
>>>>>>>>>>> likely
>>>>>>>>>>>
>>>>>>>>>>> will
>>>>>>>>>>>
>>>>>>>>>>> be wrong. That's only possible if the user can communicate 
>>>>>>>>>>> its
>>>> intent
>>>>>>>>>>>
>>>>>>>>>>> in
>>>>>>>>>>>
>>>>>>>>>>> the template.
>>>>>>>>>>> - Users don't need to deal with concepts that are irrelevant 
>>>>>>>>>>> in
>>>> their
>>>>>>>>>>> concrete use case. Just start with the trivial, `Document`, 
>>>>>>>>>>> and
>>>> later
>>>>>>>>>>>
>>>>>>>>>>> if
>>>>>>>>>>>
>>>>>>>>>>> the need arises, generalize to named documents, document 
>>>>>>>>>>> lists,
>> or
>>>>>>>>>>>
>>>>>>>>>>> both.
>>>>>>>>>>>
>>>>>>>>>>> What do guys think?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best regards,
>>>>>>>> Daniel Dekany
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best regards,
>>>>>>> Daniel Dekany
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Best regards,
>>>>> Daniel Dekany
>>>>
>>>>
>>>
>>> --
>>> Best regards,
>>> Daniel Dekany
>>
>>
>>
>
> -- 
> Best regards,
> Daniel Dekany

Re: freemarker-generator: Improving the input documents concept

Posted by Daniel Dekany <da...@gmail.com>.

>
> Actually not recommended but we have named data sources for less than 24
> hours


Sorry, not sure what that means. Anyway, my "vote" is let's not give
automatic names if that's not recommended to utilize. I mean, in case we
happen to agree on that, why leave it there. Especially if automatically
chosen names can clash with explicitly given ones, that would be a
trouble.  (I'm not sure right now if they can... the path we use as the
name can be realtive? Then it realistically can.)

This is a command line tool where we have little idea what the user will do
> or abuse


No matter how much/little we know, we firmly put our bets by releasing
something. So if some feature is certainly not right, that's enough to not
have it, I think.

How does a "data loader" knows that it is responsible to load a file

What should as "CSV data loader" should do - parse it into a list of
> records or stream one by one?


I think I was misunderstood here. It's not about some kind of auto-magic.
It's about where do you specify what to load and how, and in what format do
you specify that. Of course, you must specify the data source (basically an
URI for now as I saw), the rough format (CSV), and the format options
(separator character, etc.), and other freemarker-generator loading options
(like which CSV columns are numbers, which are dates, with what format,
what counts as null, etc.).

What was confusing in what I said much earlier is probably that you don't
need a global "--mode". That just means that you can have multiple "modes"
in the same run, not that you need some big auto-magic. And that they
aren't really "modes" then... I think it's just natural that you can have
different kind of "output generator" files in the same run. Why force the
assumption that you don't, especially considering that they will might want
to access common data (which you don't want to load again and again, for
each run of the different --mode-s you need). Of course, as you might
select files with wildcards (or by specifying a whole directory, or with
some Maven matcher), you just can't directly associate the data loader
options to the individual data sources. Instead you can say elsewhere that
*.csv inside this explicit "group", or with this file name pattern, is to
be loaded like this. That's what you might perceived as auto-magic. It's
just mass-producing data loaders for "cattle" files.

How to handle the case if you have multiple potential data loaders for a
> single file?


As per above, that's just two data loaders referring to the same data
source, so, nothing special.

As of the current state of things, this is how I'm supposed to load a CSV,
in the template itself (if I'm not outdated/mistaken):

<#assign cvsFormat = CSVTool.formats.DEFAULT.withHeader()>
<#assign foos = CSVTool.parse(Datasources.get("foos"), cvsFormat).records>
<#assign bars = CSVTool.parse(Datasources.get("barb"), cvsFormat).records>

It will worth exploring how to make these look more "idiomatic" FTL (given
this is an "official" FM product now, I think, we should show how it's
done), and nicer in general. Point for now is, that's basically two
data-loaders interwoven with the template there. Because they are
interwoven like that, you can't reuse what they loaded for another template
execution.

That's comes down to personal preferences, e.g. chown uses "owner[:group] "


Yeah, but XML namespaces, Java, C, etc. all use <parent><operator><child>,
so, I think, that clicks for more of our potential users. So let's bet on
what clicks for more users.

Besides, I challenged the very idea that we need both groups and names. :)
Saying that it's simpler and less opinioned (more flexible) to have just
multiple names (like tags). What's the end of that?

On Sun, Mar 1, 2020 at 9:47 AM Siegfried Goeschl <
siegfried.goeschl@gmail.com> wrote:

> HI Daniel,
>
> Please see my comments below
>
> Thanks in advance,
>
> Siegfried Goeschl
>
>
> > On 29.02.2020, at 21:02, Daniel Dekany <da...@gmail.com> wrote:
> >
> >>
> >> I try to provide a useful name even when the content is coming from an
> >> URL
> >
> >
> > When is it recommended to rely on that though? Because utilizing that
> means
> > that renaming a data source file can break the process, even if you call
> > freemarker-cli with the up to date file name. And if that happens depends
> > on what you (or an other random colleague!) have dug inside the
> templates.
> > So I guess we better just don't support this. Less code and less things
> to
> > document too.
> >
>
> Actually not recommended but we have named data sources for less than 24
> hours
>
> >
> >> I think we have a different understanding what a "Document" /
> "Datasource
> >> / DataSource" should do
> >
> >
> > Thing is, eventually (most certainly pre-1.0, as it influences
> > architecture), certain needs will have to addressed, somehow. Then we
> will
> > see what "things" we really need. For now I though we need "things" that
> > are much more than paths, and encapsulate the "how to load the data"
> > aspect. I called them data sources, but maybe we should called them "data
> > loaders" to free up data sources for the more primitive thing. Some
> > needs/doubts to address, *later*: Is it really the best approach for
> users
> > to load/parse data sources programmatically (that coded is written in
> FTL,
> > inside the templates)? Also, is the template the right place for doing
> > that, because, when multiple templates (or just multiple template *runs*
> of
> > the same template, each generating a different output file) needs common
> > data, they shouldn't load it again and again. Also, different topic, can
> we
> > handle the case "transparently" enough when the data is not coming from a
> > file?
>
> This is a command line tool where we have little idea what the user will
> do or abuse
>
> * How does a "data loader" knows that it is responsible to load a file
> * What should as "CSV data loader" should do - parse it into a list of
> records or stream one by one?
> * How to handle the case if you have multiple potential data loaders for a
> single file?
>
> I'm leaning towards building blocks where the user controls the work to be
> done even it requires one to two extra lines of FTL code
>
>
> >
> > The joy of programming - I did not intend to use "name:group" together
> with
> >> wildcards :-)
> >
> >
> > For a CLI tool, I guess we agree that it should work. So maybe, like this
> > (here logs and foos meant to be "groups"):
> > --data-source logs file1.log file2.log fileN.log   --data-source foos
> > foo1.csv foo2.csv fooN.csv  --data-source bar bar.xlsx
> >
> > It so happens that here you don't really have a good control about the
> > number of files associated to the name, so, maybe yet another reason to
> not
> > differentiate names and groups.
> >
> > I Disagree here - I think using a name would be used more often. I added
> >> the "group" as an afterthought since some grouping could be useful
> >
> >
> > We do agree in that. What I said is that the *syntax* should be so that
> the
> > group comes first. It's still optional. Like this:
> > --data-source group:name /somewhere
> > --data-source name /somewhere
>
> That's comes down to personal preferences, e.g. chown uses "owner[:group] "
>
> >
> > On Sat, Feb 29, 2020 at 7:34 PM Siegfried Goeschl <
> > siegfried.goeschl@gmail.com> wrote:
> >
> >> HI Daniel,
> >>
> >> Seem my comments below
> >>
> >> Thanks in advance,
> >>
> >> Siegfried Goeschl
> >>
> >>
> >>> On 29.02.2020, at 19:08, Daniel Dekany <da...@gmail.com>
> wrote:
> >>>
> >>> FREEMARKER-135 freemarker-generator-cli: Support user-supplied names
> for
> >>> datasources
> >>>
> >>> So, I can do this to have both a name an a group associated to a data
> >>> source:
> >>> --datasource someName:someGroup=somewhere/something
> >>
> >> Correct
> >>
> >>> Or if I only want a name, but not a group (or an ""  group actually -
> >>> bug?), then:
> >>> --datasource someName=somewhere/something
> >>
> >> Correct
> >>
> >>>
> >>> Or if only a group but not a name (or a "" name actually) then:
> >>> --datasource :someGroup=somewhere/something
> >>
> >> Mhmm, that would be unintended functionality from my side - current
> >> approach is that every "Document" / "Datasource / DataSource" is named
> >>
> >>>
> >>> A name must identify exactly 1 data source, while a group identifies a
> >> list
> >>> of data sources.
> >>
> >> No, every "Document" / "Datasource / DataSource" has a name currently
> but
> >> uniqueness is not enforced. Only if you want to get a "Document" /
> >> "Datasource / DataSource" with it's exact name I checked for exactly one
> >> search hit and throw an exception. I try to provide a useful name even
> when
> >> the content is coming from an URL or STDIN (and I will probably add
> >> environment variables as "Document" / "Datasource / DataSource", e.g
> >> configuration in the cloud as JSON content passed as environment
> variable)
> >>
> >>>
> >>> Is that this idea, that the a data source can be part of a group, and
> >> then
> >>> is also possibly identifiable with a name comes from an use case? I
> mean,
> >>> it's possibly important somewhere, but if so, then it's strange that
> you
> >>> can put something into only a single group. If we need this kind of
> >> thing,
> >>> then perhaps you should be just allowed to associate the data source
> >> with a
> >>> list of names (kind of like tagging), and then when the template wants
> to
> >>> get something by name, it will tell there if it expects exactly one or
> a
> >>> list of data sources. Then you don't need to introduce two terms in the
> >>> documentation either (names and groups). Again, if we want this at all,
> >>> instead of just going with a data source that itself gives a list. (And
> >> if
> >>> not, how will we handle a data source that loads from a non-file
> source?)
> >>
> >> I actually thought of implementing tagging but considered a "group"
> >> sufficient.
> >>
> >> * If you don't define anything everything goes into the "default" group
> >> * For individual documents you can define a name and an optional group
> >>
> >> I think we have a different understanding what a "Document" /
> "Datasource
> >> / DataSource" should do
> >>
> >> * It is a dumb
> >> * It is lazy since data is only loaded on demand
> >> * There is no automagic like "oh, this is a JSON file, so let's go to
> the
> >> JSON tool and create a map readily accessible in the data model"
> >>
> >>>
> >>> Note that the current command line syntax doesn't work well with shell
> >>> wildcard expansion. Like this:
> >>> --datasource :someGroup=logs/*.log
> >>> will try to expand ":someGroup=logs/*.log", and because it finds
> nothing
> >>> (and because the rules of sh and the like is a mess), you will get the
> >>> parameter value as is, without * expanded.
> >>
> >> The joy of programming - I did not intend to use "name:group" together
> >> with wildcards :-)
> >>
> >>>
> >>> Also,  I think the syntax with colon should be flipped, because on
> other
> >>> places foo:bar usually means that foo is the bigger unit (the
> container),
> >>> and bar is the smaller unit (the child).
> >>
> >> I Disagree here - I think using a name would be used more often. I added
> >> the "group" as an afterthought since some grouping could be useful
> >>
> >>>
> >>> On Sat, Feb 29, 2020 at 5:03 PM Siegfried Goeschl <
> >>> siegfried.goeschl@gmail.com> wrote:
> >>>
> >>>> Hi Daniel,
> >>>>
> >>>> I'm an enterprise developer - bad habits die hard :-)
> >>>>
> >>>> So I closed the following tickets and merged the branches
> >>>>
> >>>> 1) FREEMARKER-129 freemarker-generator: Merge "freemarker-cli" into
> >>>> "freemarker-generator"
> >>>> 2) FREEMARKER-134 freemarker-generator: Rename "Document" to
> >> "Datasource"
> >>>> 3) FREEMARKER-135 freemarker-generator-cli: Support user-supplied
> names
> >>>> for datasources
> >>>>
> >>>> Thanks in advance,
> >>>>
> >>>> Siegfried Goeschl
> >>>>
> >>>>
> >>>>> On 29.02.2020, at 12:19, Daniel Dekany <da...@gmail.com>
> >> wrote:
> >>>>>
> >>>>> Yeah, and of course, you can merge that branch. You can even work on
> >> the
> >>>>> master directly after all.
> >>>>>
> >>>>> On Sat, Feb 29, 2020 at 12:17 PM Daniel Dekany <
> >> daniel.dekany@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> But, I do recognize the cattle use case (several "faceless" files
> with
> >>>>>> common format/schema). Only, my idea is to push that complexity on
> the
> >>>> data
> >>>>>> source. The "data source" concept shields the rest of the
> application
> >>>> from
> >>>>>> the details of how the data is stored or retrieved. So, a data
> source
> >>>> might
> >>>>>> loads a bunch of log files from a directory, and present them as a
> >>>> single
> >>>>>> big table, or like a list of tables, etc. So I want to deal with the
> >>>> cattle
> >>>>>> use case, but the question is what part of the of architecture will
> >> deal
> >>>>>> with this complication, with other words, how do you box things. Why
> >> my
> >>>>>> initial bet is to stuff that complication into the "data source"
> >>>>>> implementation(s) is that data sources are inherently varied. Some
> >>>> returns
> >>>>>> a table-like thing, some have multiple named tables (worksheets in
> >>>> Excel),
> >>>>>> some returns tree of nodes (XML), etc. So then, some might returns a
> >>>>>> list-of-list-of log records, or just a single list of log-records
> (put
> >>>>>> together from daily log files). That way cattles don't add to
> >> conceptual
> >>>>>> complexity. Now, you might be aware of cases where the cattle
> concept
> >>>> must
> >>>>>> be more exposed than this, and the we can't box things like this.
> But
> >>>> this
> >>>>>> is what I tried to express.
> >>>>>>
> >>>>>> Regarding "output generators", and how that applies on the command
> >>>> line. I
> >>>>>> think it's important that the common core between Maven and
> >>>> command-line is
> >>>>>> as fat as possible. Ideally, they are just two syntax to set up the
> >> same
> >>>>>> thing. Mostly at least. So, if you specify a template file to the
> CLI
> >>>>>> application, in a way so that it causes it to process that template
> to
> >>>>>> generate a single output, then there you have just defined an
> "output
> >>>>>> generator" (even if it wasn't explicitly called like that in the
> >> command
> >>>>>> line). If you specify 3 csv files to the CLI application, in a way
> so
> >>>> that
> >>>>>> it causes it to generate 3 output files, then you have just defined
> 3
> >>>>>> "output generators" there (there's at least one template specified
> >> there
> >>>>>> too, but that wasn't an "output generator" itself, it was just an
> >>>> attribute
> >>>>>> of the 3 output generators). If you specify 1 template, and 3 csv
> >>>> files, in
> >>>>>> a way so that it will yield 4 output files (1 for the template, 3
> for
> >>>> the
> >>>>>> csv-s), then you have defined 4 output generators there. If you
> have a
> >>>> data
> >>>>>> source that loads a list of 3 entities (say, 3 csv files, so it's a
> >>>> list of
> >>>>>> tables then), and you have 2 templates, and you tell the CLI to
> >> execute
> >>>>>> each template for each item in said data source, then you have just
> >>>> defined
> >>>>>> 6 "output generators".
> >>>>>>
> >>>>>> On Fri, Feb 28, 2020 at 11:08 AM Siegfried Goeschl <
> >>>>>> siegfried.goeschl@gmail.com> wrote:
> >>>>>>
> >>>>>>> Hi Daniel,
> >>>>>>>
> >>>>>>> That all depends on your mental model and work you do,
> expectations,
> >>>>>>> experience :-)
> >>>>>>>
> >>>>>>>
> >>>>>>> __Document Handling__
> >>>>>>>
> >>>>>>> *"But I think actually we have no good use case for list of
> documents
> >>>>>>> that's passed at once to a single template run, so, we can just
> >> ignore
> >>>>>>> that complication"*
> >>>>>>>
> >>>>>>> In my case that's not a complication but my daily business - I'm
> >>>>>>> regularly wading through access logs - yesterday probably a couple
> of
> >>>>>>> hundreds access logs across two staging sites to help tracking some
> >>>>>>> strange API gateway issues :-)
> >>>>>>>
> >>>>>>> My gut feeling is (borrowing from
> >>>>>>>
> >>>>>>>
> >>>>
> >>
> https://medium.com/@Joachim8675309/devops-concepts-pets-vs-cattle-2380b5aab313
> >>>>>>> )
> >>>>>>>
> >>>>>>> 1. You have a few lovely named documents / templates - `pets`
> >>>>>>> 2. You have tons of anonymous documents / templates to process -
> >>>>>>> `cattle`
> >>>>>>> 3. The "grey area" comes into play when mixing `pets & cattle`
> >>>>>>>
> >>>>>>> `freemarker-cli` was built with 2) in mind and I want to cover 1)
> >> since
> >>>>>>> it is equally important and common.
> >>>>>>>
> >>>>>>>
> >>>>>>> __Template And Document Processing Modes__
> >>>>>>>
> >>>>>>> IMHO it is important to answer the following question : "How many
> >>>>>>> outputs do you get when rendering 2 template and 3 datasources?
> Two,
> >>>>>>> Three or Six?"
> >>>>>>>
> >>>>>>> Your answer is influenced by your mental model / experience
> >>>>>>>
> >>>>>>> * When wading through tons of CSV files, access logs, etc. the
> answer
> >>>> is
> >>>>>>> "2"
> >>>>>>> * When doing source code generation the obvious answer is "6"
> >>>>>>> * Can't image a use case which results in "3" but I'm pretty sure
> we
> >>>>>>> will encounter one
> >>>>>>>
> >>>>>>> __Template and document mode probably shouldn't exist__
> >>>>>>>
> >>>>>>> That's hard for me to fully understand - I definitely lack your
> >>>> insights
> >>>>>>> & experience writing such tools :-)
> >>>>>>>
> >>>>>>> Defining the `Output Generator` is the underlying model for the
> Maven
> >>>>>>> plugin (and probably FMPP).
> >>>>>>>
> >>>>>>> I'm not sure if this applies for command lines at least not in the
> >> way
> >>>> I
> >>>>>>> use them (or would like to use them)
> >>>>>>>
> >>>>>>>
> >>>>>>> Thanks in advance,
> >>>>>>>
> >>>>>>> Siegfried Goeschl
> >>>>>>>
> >>>>>>> PS: Can/shall I merge the PR to bring in `freemarker-cli`?
> >>>>>>>
> >>>>>>>
> >>>>>>> On 28 Feb 2020, at 9:14, Daniel Dekany wrote:
> >>>>>>>
> >>>>>>>> Yeah, "data source" is surely a too popular name, but for reason.
> >>>>>>>> Anyone
> >>>>>>>> has other ideas?
> >>>>>>>>
> >>>>>>>> As of naming data sources and such. One thing I was wondering
> about
> >>>>>>>> back
> >>>>>>>> then is how to deal with list of documents given to a template,
> >> versus
> >>>>>>>> exactly 1 document given to a template. But I think actually we
> have
> >>>>>>>> no
> >>>>>>>> good use case for list of documents that's passed at once to a
> >> single
> >>>>>>>> template run, so, we can just ignore that complication. A document
> >> has
> >>>>>>>> a
> >>>>>>>> name, and that's always just a single document, not a collection,
> as
> >>>>>>>> far as
> >>>>>>>> the template is concerned. (We can have multiple documents per
> run,
> >>>>>>>> but
> >>>>>>>> those normally yield separate output generators, so it's still
> only
> >>>>>>>> one
> >>>>>>>> document per template.) However, we can have data source types
> >>>>>>>> (document
> >>>>>>>> types with old terminology) that collect together multiple data
> >> files.
> >>>>>>>> So
> >>>>>>>> then that complexity is encapsulated into the data source type,
> and
> >>>>>>>> doesn't
> >>>>>>>> complicate the overall architecture. That's another case when a
> data
> >>>>>>>> source
> >>>>>>>> is not just a file. Like maybe there's a data source type that
> loads
> >>>>>>>> all
> >>>>>>>> the CSV-s from a directory, into a single big table (I had such
> >> case),
> >>>>>>>> or
> >>>>>>>> even into a list of tables. Or, as I mentioned already, a data
> >> source
> >>>>>>>> is
> >>>>>>>> maybe an SQL query on a JDBC data source (and we got the first
> term
> >>>>>>>> clash... JDBC also call them data sources).
> >>>>>>>>
> >>>>>>>> Template and document mode probably shouldn't exist from user
> >>>>>>>> perspective
> >>>>>>>> either, at least not as a global option that must apply to
> >> everything
> >>>>>>>> in a
> >>>>>>>> run. They could just give the files that define the "output
> >>>>>>>> generators",
> >>>>>>>> and some of them will be templates, some of them are data files,
> in
> >>>>>>>> which
> >>>>>>>> case a template need to be associated with them (and there can be
> a
> >>>>>>>> couple
> >>>>>>>> of ways of doing that). And then again, there are the cases where
> >> you
> >>>>>>>> want
> >>>>>>>> to create one output generator per entity from some data source.
> >>>>>>>>
> >>>>>>>> On Fri, Feb 28, 2020 at 8:23 AM Siegfried Goeschl <
> >>>>>>>> siegfried.goeschl@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>>> Hi Daniel,
> >>>>>>>>>
> >>>>>>>>> See my comments below - and thanks for your patience and input
> :-)
> >>>>>>>>>
> >>>>>>>>> *Renaming Document To DataSource*
> >>>>>>>>>
> >>>>>>>>> Yes, makes sense. I tried to avoid since I'm using
> javax.activation
> >>>>>>>>> and
> >>>>>>>>> its DataSource.
> >>>>>>>>>
> >>>>>>>>> *Template And Document Mode*
> >>>>>>>>>
> >>>>>>>>> Agreed - I think it is a valuable abstraction for the user but it
> >> is
> >>>>>>>>> not
> >>>>>>>>> an implementation concept :-)
> >>>>>>>>>
> >>>>>>>>> *Document Without Symbolic Names*
> >>>>>>>>>
> >>>>>>>>> Also agreed and it is going to change but I have not settled my
> >> mind
> >>>>>>>>> yet
> >>>>>>>>> what exactly to implement.
> >>>>>>>>>
> >>>>>>>>> Thanks in advance,
> >>>>>>>>>
> >>>>>>>>> Siegfried Goeschl
> >>>>>>>>>
> >>>>>>>>> On 28 Feb 2020, at 1:05, Daniel Dekany wrote:
> >>>>>>>>>
> >>>>>>>>> A few quick thoughts on that:
> >>>>>>>>>
> >>>>>>>>> - We should replace the "document" term with something more
> >> speaking.
> >>>>>>>>> It
> >>>>>>>>> doesn't tell that it's some kind of input. Also, most of these
> >> inputs
> >>>>>>>>> aren't something that people typically call documents. Like a csv
> >>>>>>>>> file, or
> >>>>>>>>> a database table, which is not even a file (OK we don't support
> >> such
> >>>>>>>>> thing
> >>>>>>>>> at the moment). I think, maybe "data source" is a safe enough
> term.
> >>>>>>>>> (It
> >>>>>>>>> also rhymes with data model.)
> >>>>>>>>> - You have separate "template" and "document" "mode", that
> applies
> >> to
> >>>>>>>>> a
> >>>>>>>>> whole run. I think such specialization won't be helpful. We could
> >>>>>>>>> just say,
> >>>>>>>>> on the conceptual level at lest, that we need a set of "outputs
> >>>>>>>>> generators". An output generator is an object (in the API) that
> >>>>>>>>> specifies a
> >>>>>>>>> template, a data-model (where the data-model is possibly
> populated
> >>>>>>>>> with
> >>>>>>>>> "documents"), and an output "sink" (a file path, or stdout), and
> >> can
> >>>>>>>>> generate the output itself. A practical way of defining the
> output
> >>>>>>>>> generators in a CLI application is via a bunch of files, each
> >>>>>>>>> defining an
> >>>>>>>>> output generator. Some of those files is maybe a template (that
> you
> >>>>>>>>> can
> >>>>>>>>> even detect from the file extension), or a data file that we
> >>>>>>>>> currently call
> >>>>>>>>> a "document". They could freely mix inside the same run. I have
> >> also
> >>>>>>>>> met
> >>>>>>>>> use case when you have a single table (single "document"), and
> each
> >>>>>>>>> record
> >>>>>>>>> in it yields an output file. That can also be described in some
> >> file
> >>>>>>>>> format, or really in any other way, like directly in command line
> >>>>>>>>> argument,
> >>>>>>>>> via API, etc.
> >>>>>>>>> - You have multiple documents without associated symbolical name
> in
> >>>>>>>>> some
> >>>>>>>>> examples. Templates can't identify those then in a well
> >> maintainable
> >>>>>>>>> way.
> >>>>>>>>> The actual file name is often not a good identifier, can change
> >> over
> >>>>>>>>> time,
> >>>>>>>>> and you might don't even have good control over it, like you
> >> already
> >>>>>>>>> receive it as a parameter from somewhere else, or someone
> >>>>>>>>> moves/renames
> >>>>>>>>> that files that you need to read. Index is also not very good,
> but
> >> I
> >>>>>>>>> have
> >>>>>>>>> written about that earlier.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Wed, Feb 26, 2020 at 9:33 AM Siegfried Goeschl <
> >>>>>>>>> siegfried.goeschl@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi folks,
> >>>>>>>>>
> >>>>>>>>> still wrapping my side around but assembled some thoughts here -
> >>>>>>>>>
> https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449
> >>>>>>>>>
> >>>>>>>>> Thanks in advance,
> >>>>>>>>>
> >>>>>>>>> Siegfried Goeschl
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 23 Feb 2020, at 23:14, Daniel Dekany <dd...@apache.org>
> >> wrote:
> >>>>>>>>>
> >>>>>>>>> What you are describing is more like the angle that FMPP took
> >>>>>>>>> initially,
> >>>>>>>>> where templates drive things, they generate the output for
> >> themselves
> >>>>>>>>>
> >>>>>>>>> (even
> >>>>>>>>>
> >>>>>>>>> multiple output files if they wish). By default output files name
> >>>>>>>>> (and
> >>>>>>>>> relative path) is deduced from template name. There was also a
> >> global
> >>>>>>>>> data-model, built in a configuration file (or equally, built via
> >>>>>>>>> command
> >>>>>>>>> line arguments, or both mixed), from which templates get whatever
> >>>>>>>>> data
> >>>>>>>>>
> >>>>>>>>> they
> >>>>>>>>>
> >>>>>>>>> are interested in. Take a look at the figures here:
> >>>>>>>>> http://fmpp.sourceforge.net/qtour.html. Later, this concept was
> >>>>>>>>>
> >>>>>>>>> generalized
> >>>>>>>>>
> >>>>>>>>> a bit more, because you could add XML files at the same place
> where
> >>>>>>>>> you
> >>>>>>>>> have the templates, and then you could associate transform
> >> templates
> >>>>>>>>> to
> >>>>>>>>>
> >>>>>>>>> the
> >>>>>>>>>
> >>>>>>>>> XML files (based on path pattern and/or the XML document
> element).
> >>>>>>>>> Now
> >>>>>>>>> that's like what freemarker-generator had initially (data files
> >> drive
> >>>>>>>>> output, and the template is there to transform it).
> >>>>>>>>>
> >>>>>>>>> So I think the generic mental model would like this:
> >>>>>>>>>
> >>>>>>>>> 1. You got files that drive the process, let's call them
> *generator
> >>>>>>>>> files* for now. Usually, each generator file yields an output
> file
> >>>>>>>>> (but
> >>>>>>>>> maybe even multiple output files, as you might saw in the last
> >>>>>>>>> figure).
> >>>>>>>>> These generator files can be of many types, like XML, JSON, XLSX
> >> (as
> >>>>>>>>>
> >>>>>>>>> in the
> >>>>>>>>>
> >>>>>>>>> original freemarker-generator), and even templates (as is the
> norm
> >> in
> >>>>>>>>> FMPP). If the file is not a template, then you got a set of
> >>>>>>>>> transformer
> >>>>>>>>> templates (-t CLI option) in a separate directory, which can be
> >>>>>>>>>
> >>>>>>>>> associated
> >>>>>>>>>
> >>>>>>>>> with the generator files base on name patterns, and even based on
> >>>>>>>>>
> >>>>>>>>> content
> >>>>>>>>>
> >>>>>>>>> (schema usually). If the generator file is a template (so that's
> a
> >>>>>>>>> positional @Parameter CLI argument that happens to be an *.ftl,
> and
> >>>>>>>>> is
> >>>>>>>>>
> >>>>>>>>> not
> >>>>>>>>>
> >>>>>>>>> a template file specified after the "-t" option), then you just
> >>>>>>>>> Template.process(...) it, and it prints what the output will be.
> >>>>>>>>> 2. You also have a set of variables, the global data-model, that
> >>>>>>>>> contains commonly useful stuff, like what you now call parameters
> >>>>>>>>> (CLI
> >>>>>>>>> -Pname=value), but also maybe data loaded from JSON, XML, etc..
> >> Those
> >>>>>>>>>
> >>>>>>>>> data
> >>>>>>>>>
> >>>>>>>>> files aren't "generator files". Templates just use them if they
> >> need
> >>>>>>>>>
> >>>>>>>>> them.
> >>>>>>>>>
> >>>>>>>>> An important thing here is to reuse the same mechanism to read
> and
> >>>>>>>>>
> >>>>>>>>> parse
> >>>>>>>>>
> >>>>>>>>> those data files, which was used in templates when transforming
> >>>>>>>>>
> >>>>>>>>> generator
> >>>>>>>>>
> >>>>>>>>> files. So we need a common format for specifying how to load data
> >>>>>>>>>
> >>>>>>>>> files.
> >>>>>>>>>
> >>>>>>>>> That's maybe just FTL that #assigns to the variables, or maybe
> more
> >>>>>>>>> declarative format.
> >>>>>>>>>
> >>>>>>>>> What I have described in the original post here was a less
> generic
> >>>>>>>>> form
> >>>>>>>>>
> >>>>>>>>> of
> >>>>>>>>>
> >>>>>>>>> this, as I tried to be true with the original approach. I though
> >> the
> >>>>>>>>> proposal will be drastic enough as it is... :) There, the "main"
> >>>>>>>>> document
> >>>>>>>>> is the "generator file" from point 1, the "-t" template is the
> >>>>>>>>> transform
> >>>>>>>>> template for the "main" document, and the other named documents
> >>>>>>>>> ("users",
> >>>>>>>>> "groups") is a poor man's shared data-model from point 2
> (together
> >>>>>>>>> with
> >>>>>>>>> with -PName=value).
> >>>>>>>>>
> >>>>>>>>> There's further somewhat confusing thing to get right with the
> >>>>>>>>> list-of-documents (`DocuentList`, `NamedDocumentLists`) thing
> >> though.
> >>>>>>>>> In
> >>>>>>>>> the model above, as per point 1, if you list multiple data files,
> >>>>>>>>> each
> >>>>>>>>>
> >>>>>>>>> will
> >>>>>>>>>
> >>>>>>>>> generate a separate output file. So, if you need take in a list
> of
> >>>>>>>>> files
> >>>>>>>>>
> >>>>>>>>> to
> >>>>>>>>>
> >>>>>>>>> transform it to a single output file (or at least with a single
> >>>>>>>>> transform
> >>>>>>>>> template execution), then you have to be explicit about that, as
> >>>>>>>>> that's
> >>>>>>>>>
> >>>>>>>>> not
> >>>>>>>>>
> >>>>>>>>> the default behavior anymore. But it's still absolutely possible.
> >>>>>>>>> Imagine
> >>>>>>>>> it as a "list of XLSX-es" is itself like a file format. You need
> >> some
> >>>>>>>>> CLI
> >>>>>>>>> (and Maven config, etc.) syntax to express that, but that
> shouldn't
> >>>>>>>>> be a
> >>>>>>>>> big deal.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Sun, Feb 23, 2020 at 9:43 PM Siegfried Goeschl <
> >>>>>>>>> siegfried.goeschl@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi Daniel,
> >>>>>>>>>
> >>>>>>>>> Good timing - I was looking at a similar problem from different
> >> angle
> >>>>>>>>> yesterday (see below)
> >>>>>>>>>
> >>>>>>>>> Don't have enough time to answer your email in detail now - will
> do
> >>>>>>>>> that
> >>>>>>>>> tomorrow evening
> >>>>>>>>>
> >>>>>>>>> Thanks in advance,
> >>>>>>>>>
> >>>>>>>>> Siegfried Goeschl
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> ===. START
> >>>>>>>>> # FreeMarker CLI Improvement
> >>>>>>>>> ## Support Of Multiple Template Files
> >>>>>>>>> Currently we support the following combinations
> >>>>>>>>>
> >>>>>>>>> * Single template and no data files
> >>>>>>>>> * Single template and one or more data files
> >>>>>>>>>
> >>>>>>>>> But we can not support the following use case which is quite
> >> typical
> >>>>>>>>> in
> >>>>>>>>> the cloud
> >>>>>>>>>
> >>>>>>>>> __Convert multiple templates with a single data file, e.g
> copying a
> >>>>>>>>> directory of configuration files using a JSON configuration
> file__
> >>>>>>>>>
> >>>>>>>>> ## Implementation notes
> >>>>>>>>> * When we copy a directory we can remove the `ftl`extension on
> the
> >>>>>>>>> fly
> >>>>>>>>> * We might need an `exclude` filter for the copy operation
> >>>>>>>>> * Initially resolve to a list of template files and process one
> >> after
> >>>>>>>>> another
> >>>>>>>>> * Need to calculate the output file location and extension
> >>>>>>>>> * We need to rename the existing command line parameters (see
> >> below)
> >>>>>>>>> * Do we need multiple include and exclude filter?
> >>>>>>>>> * Do we need file versus directory filters?
> >>>>>>>>>
> >>>>>>>>> ### Command Line Options
> >>>>>>>>> ```
> >>>>>>>>> --input-encoding : Encoding of the documents
> >>>>>>>>> --output-encoding : Encoding of the rendered template
> >>>>>>>>> --template-encoding : Encoding of the template
> >>>>>>>>> --output : Output file or directory
> >>>>>>>>> --include-document : Include pattern for documents
> >>>>>>>>> --exclude-document : Exclude pattern for documents
> >>>>>>>>> --include-template: Include pattern for templates
> >>>>>>>>> --exclude-template : Exclude pattern for templates
> >>>>>>>>> ```
> >>>>>>>>>
> >>>>>>>>> ### Command Line Examples
> >>>>>>>>> ```text
> >>>>>>>>> # Copy all FTL templates found in "ext/config" to the "/config"
> >>>>>>>>>
> >>>>>>>>> directory
> >>>>>>>>>
> >>>>>>>>> using the data from "config.json"
> >>>>>>>>>
> >>>>>>>>> freemarker-cli -t ./ext/config --include-template *.ftl --o
> /config
> >>>>>>>>>
> >>>>>>>>> config.json
> >>>>>>>>>
> >>>>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
> >>>>>>>>>
> >>>>>>>>> --output
> >>>>>>>>>
> >>>>>>>>> /config config.json
> >>>>>>>>>
> >>>>>>>>> # Bascically the same using a named document "configuration"
> >>>>>>>>> # It might make sense to expose "conf" directly in the FreeMarker
> >>>>>>>>> data
> >>>>>>>>> model
> >>>>>>>>> # It might make sens to allow URIs for loading documents
> >>>>>>>>>
> >>>>>>>>> freemarker-cli -t ./ext/config/*.ftl -o /config -d
> >>>>>>>>>
> >>>>>>>>> configuration=config.json
> >>>>>>>>>
> >>>>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
> >>>>>>>>>
> >>>>>>>>> --output
> >>>>>>>>>
> >>>>>>>>> /config --document configuration=config.json
> >>>>>>>>>
> >>>>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
> >>>>>>>>>
> >>>>>>>>> --output
> >>>>>>>>>
> >>>>>>>>> /config --document configuration=file:///config.json
> >>>>>>>>>
> >>>>>>>>> # Bascically the same using an environment variable as named
> >> document
> >>>>>>>>>
> >>>>>>>>> freemarker-cli -t ./ext/config --include-template *.ftl -o
> /config
> >> -d
> >>>>>>>>>
> >>>>>>>>> configuration=env:///CONFIGURATION
> >>>>>>>>>
> >>>>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
> >>>>>>>>>
> >>>>>>>>> --output
> >>>>>>>>>
> >>>>>>>>> /config --document configuration=env:///CONFIGURATION
> >>>>>>>>> ```
> >>>>>>>>> === END
> >>>>>>>>>
> >>>>>>>>> On 23.02.2020, at 16:37, Daniel Dekany <dd...@apache.org>
> wrote:
> >>>>>>>>>
> >>>>>>>>> Input documents is a fundamental concept in freemarker-generator,
> >> so
> >>>>>>>>> we
> >>>>>>>>> should think about that more, and probably refine/rework how it's
> >>>>>>>>> done.
> >>>>>>>>>
> >>>>>>>>> Currently it works like this, with CLI at least.
> >>>>>>>>>
> >>>>>>>>> freemarker-cli
> >>>>>>>>> -t access-report.ftl
> >>>>>>>>> somewhere/foo-access-log.csv
> >>>>>>>>>
> >>>>>>>>> Then in access-report.ftl you have to do something like this:
> >>>>>>>>>
> >>>>>>>>> <#assign doc = Documents.get(0)>
> >>>>>>>>> ... process doc here
> >>>>>>>>>
> >>>>>>>>> (The more idiomatic Documents[0] won't work. Actually, that lead
> >> to a
> >>>>>>>>>
> >>>>>>>>> funny
> >>>>>>>>>
> >>>>>>>>> chain of coincidences: It returned the string "D", then
> >>>>>>>>>
> >>>>>>>>> CSVTool.parse(...)
> >>>>>>>>>
> >>>>>>>>> happily parsed that to a table with the single column "D", and 0
> >>>>>>>>> rows,
> >>>>>>>>>
> >>>>>>>>> and
> >>>>>>>>>
> >>>>>>>>> as there were 0 rows, the template didn't run into an error
> because
> >>>>>>>>> row.myExpectedColumn refers to a missing column either, so the
> >>>>>>>>> process
> >>>>>>>>> finished with success. (: Pretty unlucky for sure. The root was
> >>>>>>>>> unintentionally breaking a FreeMarker idiom though; eventually we
> >>>>>>>>> will
> >>>>>>>>>
> >>>>>>>>> have
> >>>>>>>>>
> >>>>>>>>> to work on those too, but, different topic.)
> >>>>>>>>>
> >>>>>>>>> However, actually multiple input documents can be passed in:
> >>>>>>>>>
> >>>>>>>>> freemarker-cli
> >>>>>>>>> -t access-report.ftl
> >>>>>>>>> somewhere/foo-access-log.csv
> >>>>>>>>> somewhere/bar-access-log.csv
> >>>>>>>>>
> >>>>>>>>> Above template will still work, though then you ignored all but
> the
> >>>>>>>>>
> >>>>>>>>> first
> >>>>>>>>>
> >>>>>>>>> document. So if you expect any number of input documents, you
> >>>>>>>>> probably
> >>>>>>>>>
> >>>>>>>>> will
> >>>>>>>>>
> >>>>>>>>> have to do this:
> >>>>>>>>>
> >>>>>>>>> <#list Documents.list as doc>
> >>>>>>>>> ... process doc here
> >>>>>>>>> </#list>
> >>>>>>>>>
> >>>>>>>>> (The more idiomatic <#list Documents as doc> won't work; but
> again,
> >>>>>>>>>
> >>>>>>>>> those
> >>>>>>>>>
> >>>>>>>>> we will work out in a different thread.)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> So, what would be better, in my opinion. I start out from what I
> >>>>>>>>> think
> >>>>>>>>>
> >>>>>>>>> are
> >>>>>>>>>
> >>>>>>>>> the common uses cases, in decreasing order of frequency. Goal is
> to
> >>>>>>>>>
> >>>>>>>>> make
> >>>>>>>>>
> >>>>>>>>> those less error prone for the users, and simpler to express.
> >>>>>>>>>
> >>>>>>>>> USE CASE 1
> >>>>>>>>>
> >>>>>>>>> You have exactly 1 input documents, which is therefore simply
> "the"
> >>>>>>>>> document in the mind of the user. This is probably the typical
> use
> >>>>>>>>>
> >>>>>>>>> case,
> >>>>>>>>>
> >>>>>>>>> but at least the use case users typically start out from when
> >>>>>>>>> starting
> >>>>>>>>>
> >>>>>>>>> the
> >>>>>>>>>
> >>>>>>>>> work.
> >>>>>>>>>
> >>>>>>>>> freemarker-cli
> >>>>>>>>> -t access-report.ftl
> >>>>>>>>> somewhere/foo-access-log.csv
> >>>>>>>>>
> >>>>>>>>> Then `Documents.get(0)` is not very fitting. Most importantly
> it's
> >>>>>>>>>
> >>>>>>>>> error
> >>>>>>>>>
> >>>>>>>>> prone, because if the user passed in more than 1 documents (can
> >> even
> >>>>>>>>>
> >>>>>>>>> happen
> >>>>>>>>>
> >>>>>>>>> totally accidentally, like if the user was lazy and used a
> wildcard
> >>>>>>>>>
> >>>>>>>>> that
> >>>>>>>>>
> >>>>>>>>> the shell exploded), the template will silently ignore the rest
> of
> >>>>>>>>> the
> >>>>>>>>> documents, and the singe document processed will be practically
> >>>>>>>>> picked
> >>>>>>>>> randomly. The user might won't notice that and submits a bad
> report
> >>>>>>>>> or
> >>>>>>>>>
> >>>>>>>>> such.
> >>>>>>>>>
> >>>>>>>>> I think that in this use case the document should be simply
> >> referred
> >>>>>>>>> as
> >>>>>>>>> `Document` in the template. When you have multiple documents
> there,
> >>>>>>>>> referring to `Document` should be an error, saying that the
> >> template
> >>>>>>>>>
> >>>>>>>>> was
> >>>>>>>>>
> >>>>>>>>> made to process a single document only.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> USE CASE 2
> >>>>>>>>>
> >>>>>>>>> You have multiple input documents, but each has different role
> >>>>>>>>>
> >>>>>>>>> (different
> >>>>>>>>>
> >>>>>>>>> schema, maybe different file type). Like, you pass in users.csv
> and
> >>>>>>>>> groups.csv. Each has difference schema, and so you want to access
> >>>>>>>>> them
> >>>>>>>>> differently, but in the same template.
> >>>>>>>>>
> >>>>>>>>> freemarker-cli
> >>>>>>>>> [...]
> >>>>>>>>> --named-document users somewhere/foo-users.csv
> >>>>>>>>> --named-document groups somewhere/foo-groups.csv
> >>>>>>>>>
> >>>>>>>>> Then in the template you could refer to them as:
> >>>>>>>>>
> >>>>>>>>> `NamedDocuments.users`,
> >>>>>>>>>
> >>>>>>>>> and `NamedDocuments.groups`.
> >>>>>>>>>
> >>>>>>>>> Use Case 1, and 2 can be unified into a coherent concept, where
> >>>>>>>>>
> >>>>>>>>> `Document`
> >>>>>>>>>
> >>>>>>>>> is just a shorthand for `NamedDocuments.main`. It's called "main"
> >>>>>>>>>
> >>>>>>>>> because
> >>>>>>>>>
> >>>>>>>>> that's "the" document the template is about, but then you have to
> >>>>>>>>> added
> >>>>>>>>> some helper documents, with symbolic names representing their
> role.
> >>>>>>>>>
> >>>>>>>>> freemarker-cli
> >>>>>>>>> -t access-report.ftl
> >>>>>>>>> --document-name=main somewhere/foo-access-log.csv
> >>>>>>>>> --document-name=users somewhere/foo-users.csv
> >>>>>>>>> --document-name=groups somewhere/foo-groups.csv
> >>>>>>>>>
> >>>>>>>>> Here, `Document` still works in the template, and it refers to
> >>>>>>>>> `somewhere/foo-access-log.csv`. (While omitting
> >> --document-name=main
> >>>>>>>>>
> >>>>>>>>> above
> >>>>>>>>>
> >>>>>>>>> would be cleaner, I couldn't figure out how to do that with
> >> Picocli.
> >>>>>>>>> Anyway, for now the point is the concept, which is not specific
> to
> >>>>>>>>>
> >>>>>>>>> CLI.)
> >>>>>>>>>
> >>>>>>>>> USE CASE 3
> >>>>>>>>>
> >>>>>>>>> Here you have several of the same kind of documents. That has a
> >> more
> >>>>>>>>> generic sub-use-case, when you have explicitly named documents
> >> (like
> >>>>>>>>> "users" above), and for some you expect multiple input files.
> >>>>>>>>>
> >>>>>>>>> freemarker-cli
> >>>>>>>>> -t access-report.ftl
> >>>>>>>>> --document-name=main somewhere/foo-access-log.csv
> >>>>>>>>> somewhere/bar-access-log.csv
> >>>>>>>>> --document-name=users somewhere/foo-users.csv
> >>>>>>>>> somewhere/bar-users.csv
> >>>>>>>>> --document-name=groups somewhere/global-groups.csv
> >>>>>>>>>
> >>>>>>>>> The template must to be written with this use case in mind, as
> now
> >> it
> >>>>>>>>>
> >>>>>>>>> has
> >>>>>>>>>
> >>>>>>>>> #list some of the documents. (I think in practice you hardly ever
> >>>>>>>>> want
> >>>>>>>>>
> >>>>>>>>> to
> >>>>>>>>>
> >>>>>>>>> get a document by hard coded index. Either you don't know how
> many
> >>>>>>>>> documents you have, so you can't use hard coded indexes, or you
> do,
> >>>>>>>>> and
> >>>>>>>>> each index has a specific meaning, but then you should name the
> >>>>>>>>>
> >>>>>>>>> documents
> >>>>>>>>>
> >>>>>>>>> instead, as using indexes is error prone, and hard to read.)
> >>>>>>>>> Accessing that list of documents in the template, maybe could be
> >> done
> >>>>>>>>>
> >>>>>>>>> like
> >>>>>>>>>
> >>>>>>>>> this:
> >>>>>>>>> - For the "main" documents: `DocumentList`
> >>>>>>>>> - For explicitly named documents, like "users":
> >>>>>>>>>
> >>>>>>>>> `NamedDocumentLists.users`
> >>>>>>>>>
> >>>>>>>>> SUMMING UP
> >>>>>>>>>
> >>>>>>>>> To unify all 3 use cases into a coherent concept:
> >>>>>>>>> - `NamedDocumentLists.<name>` is the most generic form, and while
> >> you
> >>>>>>>>>
> >>>>>>>>> can
> >>>>>>>>>
> >>>>>>>>> achieve everything with it, using it requires your template to
> >> handle
> >>>>>>>>>
> >>>>>>>>> the
> >>>>>>>>>
> >>>>>>>>> most generic case too. So, I think it would be rarely used.
> >>>>>>>>> - `DocumentList` is just a shorthand for
> `NamedDocumentLists.main`.
> >>>>>>>>>
> >>>>>>>>> It's
> >>>>>>>>>
> >>>>>>>>> used if you only have one kind of documents (single format and
> >>>>>>>>> schema),
> >>>>>>>>>
> >>>>>>>>> but
> >>>>>>>>>
> >>>>>>>>> potentially multiple of them.
> >>>>>>>>> - `NamedDocuments.<name>` expresses that you expect exactly 1
> >>>>>>>>> document
> >>>>>>>>>
> >>>>>>>>> of
> >>>>>>>>>
> >>>>>>>>> the given name.
> >>>>>>>>> - `Document` is just a shorthand for `NamedDocuments.main`. This
> is
> >>>>>>>>> for
> >>>>>>>>>
> >>>>>>>>> the
> >>>>>>>>>
> >>>>>>>>> most natural/frequent use case.
> >>>>>>>>>
> >>>>>>>>> That's 4 possible ways of accessing your documents, which is a
> >>>>>>>>>
> >>>>>>>>> trade-off
> >>>>>>>>>
> >>>>>>>>> for the sake of these:
> >>>>>>>>> - Catching CLI (or Maven, etc.) input where the template output
> >>>>>>>>> likely
> >>>>>>>>>
> >>>>>>>>> will
> >>>>>>>>>
> >>>>>>>>> be wrong. That's only possible if the user can communicate its
> >> intent
> >>>>>>>>>
> >>>>>>>>> in
> >>>>>>>>>
> >>>>>>>>> the template.
> >>>>>>>>> - Users don't need to deal with concepts that are irrelevant in
> >> their
> >>>>>>>>> concrete use case. Just start with the trivial, `Document`, and
> >> later
> >>>>>>>>>
> >>>>>>>>> if
> >>>>>>>>>
> >>>>>>>>> the need arises, generalize to named documents, document lists,
> or
> >>>>>>>>>
> >>>>>>>>> both.
> >>>>>>>>>
> >>>>>>>>> What do guys think?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Best regards,
> >>>>>> Daniel Dekany
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Best regards,
> >>>>> Daniel Dekany
> >>>>
> >>>>
> >>>
> >>> --
> >>> Best regards,
> >>> Daniel Dekany
> >>
> >>
> >
> > --
> > Best regards,
> > Daniel Dekany
>
>
>

-- 
Best regards,
Daniel Dekany

Re: freemarker-generator: Improving the input documents concept

Posted by Siegfried Goeschl <si...@gmail.com>.

HI Daniel,

Please see my comments below

Thanks in advance, 

Siegfried Goeschl


> On 29.02.2020, at 21:02, Daniel Dekany <da...@gmail.com> wrote:
> 
>> 
>> I try to provide a useful name even when the content is coming from an
>> URL
> 
> 
> When is it recommended to rely on that though? Because utilizing that means
> that renaming a data source file can break the process, even if you call
> freemarker-cli with the up to date file name. And if that happens depends
> on what you (or an other random colleague!) have dug inside the templates.
> So I guess we better just don't support this. Less code and less things to
> document too.
> 

Actually not recommended but we have named data sources for less than 24 hours

> 
>> I think we have a different understanding what a "Document" / "Datasource
>> / DataSource" should do
> 
> 
> Thing is, eventually (most certainly pre-1.0, as it influences
> architecture), certain needs will have to addressed, somehow. Then we will
> see what "things" we really need. For now I though we need "things" that
> are much more than paths, and encapsulate the "how to load the data"
> aspect. I called them data sources, but maybe we should called them "data
> loaders" to free up data sources for the more primitive thing. Some
> needs/doubts to address, *later*: Is it really the best approach for users
> to load/parse data sources programmatically (that coded is written in FTL,
> inside the templates)? Also, is the template the right place for doing
> that, because, when multiple templates (or just multiple template *runs* of
> the same template, each generating a different output file) needs common
> data, they shouldn't load it again and again. Also, different topic, can we
> handle the case "transparently" enough when the data is not coming from a
> file?

This is a command line tool where we have little idea what the user will do or abuse

* How does a "data loader" knows that it is responsible to load a file
* What should as "CSV data loader" should do - parse it into a list of records or stream one by one?
* How to handle the case if you have multiple potential data loaders for a single file?

I'm leaning towards building blocks where the user controls the work to be done even it requires one to two extra lines of FTL code


> 
> The joy of programming - I did not intend to use "name:group" together with
>> wildcards :-)
> 
> 
> For a CLI tool, I guess we agree that it should work. So maybe, like this
> (here logs and foos meant to be "groups"):
> --data-source logs file1.log file2.log fileN.log   --data-source foos
> foo1.csv foo2.csv fooN.csv  --data-source bar bar.xlsx
> 
> It so happens that here you don't really have a good control about the
> number of files associated to the name, so, maybe yet another reason to not
> differentiate names and groups.
> 
> I Disagree here - I think using a name would be used more often. I added
>> the "group" as an afterthought since some grouping could be useful
> 
> 
> We do agree in that. What I said is that the *syntax* should be so that the
> group comes first. It's still optional. Like this:
> --data-source group:name /somewhere
> --data-source name /somewhere

That's comes down to personal preferences, e.g. chown uses "owner[:group] "

> 
> On Sat, Feb 29, 2020 at 7:34 PM Siegfried Goeschl <
> siegfried.goeschl@gmail.com> wrote:
> 
>> HI Daniel,
>> 
>> Seem my comments below
>> 
>> Thanks in advance,
>> 
>> Siegfried Goeschl
>> 
>> 
>>> On 29.02.2020, at 19:08, Daniel Dekany <da...@gmail.com> wrote:
>>> 
>>> FREEMARKER-135 freemarker-generator-cli: Support user-supplied names for
>>> datasources
>>> 
>>> So, I can do this to have both a name an a group associated to a data
>>> source:
>>> --datasource someName:someGroup=somewhere/something
>> 
>> Correct
>> 
>>> Or if I only want a name, but not a group (or an ""  group actually -
>>> bug?), then:
>>> --datasource someName=somewhere/something
>> 
>> Correct
>> 
>>> 
>>> Or if only a group but not a name (or a "" name actually) then:
>>> --datasource :someGroup=somewhere/something
>> 
>> Mhmm, that would be unintended functionality from my side - current
>> approach is that every "Document" / "Datasource / DataSource" is named
>> 
>>> 
>>> A name must identify exactly 1 data source, while a group identifies a
>> list
>>> of data sources.
>> 
>> No, every "Document" / "Datasource / DataSource" has a name currently but
>> uniqueness is not enforced. Only if you want to get a "Document" /
>> "Datasource / DataSource" with it's exact name I checked for exactly one
>> search hit and throw an exception. I try to provide a useful name even when
>> the content is coming from an URL or STDIN (and I will probably add
>> environment variables as "Document" / "Datasource / DataSource", e.g
>> configuration in the cloud as JSON content passed as environment variable)
>> 
>>> 
>>> Is that this idea, that the a data source can be part of a group, and
>> then
>>> is also possibly identifiable with a name comes from an use case? I mean,
>>> it's possibly important somewhere, but if so, then it's strange that you
>>> can put something into only a single group. If we need this kind of
>> thing,
>>> then perhaps you should be just allowed to associate the data source
>> with a
>>> list of names (kind of like tagging), and then when the template wants to
>>> get something by name, it will tell there if it expects exactly one or a
>>> list of data sources. Then you don't need to introduce two terms in the
>>> documentation either (names and groups). Again, if we want this at all,
>>> instead of just going with a data source that itself gives a list. (And
>> if
>>> not, how will we handle a data source that loads from a non-file source?)
>> 
>> I actually thought of implementing tagging but considered a "group"
>> sufficient.
>> 
>> * If you don't define anything everything goes into the "default" group
>> * For individual documents you can define a name and an optional group
>> 
>> I think we have a different understanding what a "Document" / "Datasource
>> / DataSource" should do
>> 
>> * It is a dumb
>> * It is lazy since data is only loaded on demand
>> * There is no automagic like "oh, this is a JSON file, so let's go to the
>> JSON tool and create a map readily accessible in the data model"
>> 
>>> 
>>> Note that the current command line syntax doesn't work well with shell
>>> wildcard expansion. Like this:
>>> --datasource :someGroup=logs/*.log
>>> will try to expand ":someGroup=logs/*.log", and because it finds nothing
>>> (and because the rules of sh and the like is a mess), you will get the
>>> parameter value as is, without * expanded.
>> 
>> The joy of programming - I did not intend to use "name:group" together
>> with wildcards :-)
>> 
>>> 
>>> Also,  I think the syntax with colon should be flipped, because on other
>>> places foo:bar usually means that foo is the bigger unit (the container),
>>> and bar is the smaller unit (the child).
>> 
>> I Disagree here - I think using a name would be used more often. I added
>> the "group" as an afterthought since some grouping could be useful
>> 
>>> 
>>> On Sat, Feb 29, 2020 at 5:03 PM Siegfried Goeschl <
>>> siegfried.goeschl@gmail.com> wrote:
>>> 
>>>> Hi Daniel,
>>>> 
>>>> I'm an enterprise developer - bad habits die hard :-)
>>>> 
>>>> So I closed the following tickets and merged the branches
>>>> 
>>>> 1) FREEMARKER-129 freemarker-generator: Merge "freemarker-cli" into
>>>> "freemarker-generator"
>>>> 2) FREEMARKER-134 freemarker-generator: Rename "Document" to
>> "Datasource"
>>>> 3) FREEMARKER-135 freemarker-generator-cli: Support user-supplied names
>>>> for datasources
>>>> 
>>>> Thanks in advance,
>>>> 
>>>> Siegfried Goeschl
>>>> 
>>>> 
>>>>> On 29.02.2020, at 12:19, Daniel Dekany <da...@gmail.com>
>> wrote:
>>>>> 
>>>>> Yeah, and of course, you can merge that branch. You can even work on
>> the
>>>>> master directly after all.
>>>>> 
>>>>> On Sat, Feb 29, 2020 at 12:17 PM Daniel Dekany <
>> daniel.dekany@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> But, I do recognize the cattle use case (several "faceless" files with
>>>>>> common format/schema). Only, my idea is to push that complexity on the
>>>> data
>>>>>> source. The "data source" concept shields the rest of the application
>>>> from
>>>>>> the details of how the data is stored or retrieved. So, a data source
>>>> might
>>>>>> loads a bunch of log files from a directory, and present them as a
>>>> single
>>>>>> big table, or like a list of tables, etc. So I want to deal with the
>>>> cattle
>>>>>> use case, but the question is what part of the of architecture will
>> deal
>>>>>> with this complication, with other words, how do you box things. Why
>> my
>>>>>> initial bet is to stuff that complication into the "data source"
>>>>>> implementation(s) is that data sources are inherently varied. Some
>>>> returns
>>>>>> a table-like thing, some have multiple named tables (worksheets in
>>>> Excel),
>>>>>> some returns tree of nodes (XML), etc. So then, some might returns a
>>>>>> list-of-list-of log records, or just a single list of log-records (put
>>>>>> together from daily log files). That way cattles don't add to
>> conceptual
>>>>>> complexity. Now, you might be aware of cases where the cattle concept
>>>> must
>>>>>> be more exposed than this, and the we can't box things like this. But
>>>> this
>>>>>> is what I tried to express.
>>>>>> 
>>>>>> Regarding "output generators", and how that applies on the command
>>>> line. I
>>>>>> think it's important that the common core between Maven and
>>>> command-line is
>>>>>> as fat as possible. Ideally, they are just two syntax to set up the
>> same
>>>>>> thing. Mostly at least. So, if you specify a template file to the CLI
>>>>>> application, in a way so that it causes it to process that template to
>>>>>> generate a single output, then there you have just defined an "output
>>>>>> generator" (even if it wasn't explicitly called like that in the
>> command
>>>>>> line). If you specify 3 csv files to the CLI application, in a way so
>>>> that
>>>>>> it causes it to generate 3 output files, then you have just defined 3
>>>>>> "output generators" there (there's at least one template specified
>> there
>>>>>> too, but that wasn't an "output generator" itself, it was just an
>>>> attribute
>>>>>> of the 3 output generators). If you specify 1 template, and 3 csv
>>>> files, in
>>>>>> a way so that it will yield 4 output files (1 for the template, 3 for
>>>> the
>>>>>> csv-s), then you have defined 4 output generators there. If you have a
>>>> data
>>>>>> source that loads a list of 3 entities (say, 3 csv files, so it's a
>>>> list of
>>>>>> tables then), and you have 2 templates, and you tell the CLI to
>> execute
>>>>>> each template for each item in said data source, then you have just
>>>> defined
>>>>>> 6 "output generators".
>>>>>> 
>>>>>> On Fri, Feb 28, 2020 at 11:08 AM Siegfried Goeschl <
>>>>>> siegfried.goeschl@gmail.com> wrote:
>>>>>> 
>>>>>>> Hi Daniel,
>>>>>>> 
>>>>>>> That all depends on your mental model and work you do, expectations,
>>>>>>> experience :-)
>>>>>>> 
>>>>>>> 
>>>>>>> __Document Handling__
>>>>>>> 
>>>>>>> *"But I think actually we have no good use case for list of documents
>>>>>>> that's passed at once to a single template run, so, we can just
>> ignore
>>>>>>> that complication"*
>>>>>>> 
>>>>>>> In my case that's not a complication but my daily business - I'm
>>>>>>> regularly wading through access logs - yesterday probably a couple of
>>>>>>> hundreds access logs across two staging sites to help tracking some
>>>>>>> strange API gateway issues :-)
>>>>>>> 
>>>>>>> My gut feeling is (borrowing from
>>>>>>> 
>>>>>>> 
>>>> 
>> https://medium.com/@Joachim8675309/devops-concepts-pets-vs-cattle-2380b5aab313
>>>>>>> )
>>>>>>> 
>>>>>>> 1. You have a few lovely named documents / templates - `pets`
>>>>>>> 2. You have tons of anonymous documents / templates to process -
>>>>>>> `cattle`
>>>>>>> 3. The "grey area" comes into play when mixing `pets & cattle`
>>>>>>> 
>>>>>>> `freemarker-cli` was built with 2) in mind and I want to cover 1)
>> since
>>>>>>> it is equally important and common.
>>>>>>> 
>>>>>>> 
>>>>>>> __Template And Document Processing Modes__
>>>>>>> 
>>>>>>> IMHO it is important to answer the following question : "How many
>>>>>>> outputs do you get when rendering 2 template and 3 datasources? Two,
>>>>>>> Three or Six?"
>>>>>>> 
>>>>>>> Your answer is influenced by your mental model / experience
>>>>>>> 
>>>>>>> * When wading through tons of CSV files, access logs, etc. the answer
>>>> is
>>>>>>> "2"
>>>>>>> * When doing source code generation the obvious answer is "6"
>>>>>>> * Can't image a use case which results in "3" but I'm pretty sure we
>>>>>>> will encounter one
>>>>>>> 
>>>>>>> __Template and document mode probably shouldn't exist__
>>>>>>> 
>>>>>>> That's hard for me to fully understand - I definitely lack your
>>>> insights
>>>>>>> & experience writing such tools :-)
>>>>>>> 
>>>>>>> Defining the `Output Generator` is the underlying model for the Maven
>>>>>>> plugin (and probably FMPP).
>>>>>>> 
>>>>>>> I'm not sure if this applies for command lines at least not in the
>> way
>>>> I
>>>>>>> use them (or would like to use them)
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks in advance,
>>>>>>> 
>>>>>>> Siegfried Goeschl
>>>>>>> 
>>>>>>> PS: Can/shall I merge the PR to bring in `freemarker-cli`?
>>>>>>> 
>>>>>>> 
>>>>>>> On 28 Feb 2020, at 9:14, Daniel Dekany wrote:
>>>>>>> 
>>>>>>>> Yeah, "data source" is surely a too popular name, but for reason.
>>>>>>>> Anyone
>>>>>>>> has other ideas?
>>>>>>>> 
>>>>>>>> As of naming data sources and such. One thing I was wondering about
>>>>>>>> back
>>>>>>>> then is how to deal with list of documents given to a template,
>> versus
>>>>>>>> exactly 1 document given to a template. But I think actually we have
>>>>>>>> no
>>>>>>>> good use case for list of documents that's passed at once to a
>> single
>>>>>>>> template run, so, we can just ignore that complication. A document
>> has
>>>>>>>> a
>>>>>>>> name, and that's always just a single document, not a collection, as
>>>>>>>> far as
>>>>>>>> the template is concerned. (We can have multiple documents per run,
>>>>>>>> but
>>>>>>>> those normally yield separate output generators, so it's still only
>>>>>>>> one
>>>>>>>> document per template.) However, we can have data source types
>>>>>>>> (document
>>>>>>>> types with old terminology) that collect together multiple data
>> files.
>>>>>>>> So
>>>>>>>> then that complexity is encapsulated into the data source type, and
>>>>>>>> doesn't
>>>>>>>> complicate the overall architecture. That's another case when a data
>>>>>>>> source
>>>>>>>> is not just a file. Like maybe there's a data source type that loads
>>>>>>>> all
>>>>>>>> the CSV-s from a directory, into a single big table (I had such
>> case),
>>>>>>>> or
>>>>>>>> even into a list of tables. Or, as I mentioned already, a data
>> source
>>>>>>>> is
>>>>>>>> maybe an SQL query on a JDBC data source (and we got the first term
>>>>>>>> clash... JDBC also call them data sources).
>>>>>>>> 
>>>>>>>> Template and document mode probably shouldn't exist from user
>>>>>>>> perspective
>>>>>>>> either, at least not as a global option that must apply to
>> everything
>>>>>>>> in a
>>>>>>>> run. They could just give the files that define the "output
>>>>>>>> generators",
>>>>>>>> and some of them will be templates, some of them are data files, in
>>>>>>>> which
>>>>>>>> case a template need to be associated with them (and there can be a
>>>>>>>> couple
>>>>>>>> of ways of doing that). And then again, there are the cases where
>> you
>>>>>>>> want
>>>>>>>> to create one output generator per entity from some data source.
>>>>>>>> 
>>>>>>>> On Fri, Feb 28, 2020 at 8:23 AM Siegfried Goeschl <
>>>>>>>> siegfried.goeschl@gmail.com> wrote:
>>>>>>>> 
>>>>>>>>> Hi Daniel,
>>>>>>>>> 
>>>>>>>>> See my comments below - and thanks for your patience and input :-)
>>>>>>>>> 
>>>>>>>>> *Renaming Document To DataSource*
>>>>>>>>> 
>>>>>>>>> Yes, makes sense. I tried to avoid since I'm using javax.activation
>>>>>>>>> and
>>>>>>>>> its DataSource.
>>>>>>>>> 
>>>>>>>>> *Template And Document Mode*
>>>>>>>>> 
>>>>>>>>> Agreed - I think it is a valuable abstraction for the user but it
>> is
>>>>>>>>> not
>>>>>>>>> an implementation concept :-)
>>>>>>>>> 
>>>>>>>>> *Document Without Symbolic Names*
>>>>>>>>> 
>>>>>>>>> Also agreed and it is going to change but I have not settled my
>> mind
>>>>>>>>> yet
>>>>>>>>> what exactly to implement.
>>>>>>>>> 
>>>>>>>>> Thanks in advance,
>>>>>>>>> 
>>>>>>>>> Siegfried Goeschl
>>>>>>>>> 
>>>>>>>>> On 28 Feb 2020, at 1:05, Daniel Dekany wrote:
>>>>>>>>> 
>>>>>>>>> A few quick thoughts on that:
>>>>>>>>> 
>>>>>>>>> - We should replace the "document" term with something more
>> speaking.
>>>>>>>>> It
>>>>>>>>> doesn't tell that it's some kind of input. Also, most of these
>> inputs
>>>>>>>>> aren't something that people typically call documents. Like a csv
>>>>>>>>> file, or
>>>>>>>>> a database table, which is not even a file (OK we don't support
>> such
>>>>>>>>> thing
>>>>>>>>> at the moment). I think, maybe "data source" is a safe enough term.
>>>>>>>>> (It
>>>>>>>>> also rhymes with data model.)
>>>>>>>>> - You have separate "template" and "document" "mode", that applies
>> to
>>>>>>>>> a
>>>>>>>>> whole run. I think such specialization won't be helpful. We could
>>>>>>>>> just say,
>>>>>>>>> on the conceptual level at lest, that we need a set of "outputs
>>>>>>>>> generators". An output generator is an object (in the API) that
>>>>>>>>> specifies a
>>>>>>>>> template, a data-model (where the data-model is possibly populated
>>>>>>>>> with
>>>>>>>>> "documents"), and an output "sink" (a file path, or stdout), and
>> can
>>>>>>>>> generate the output itself. A practical way of defining the output
>>>>>>>>> generators in a CLI application is via a bunch of files, each
>>>>>>>>> defining an
>>>>>>>>> output generator. Some of those files is maybe a template (that you
>>>>>>>>> can
>>>>>>>>> even detect from the file extension), or a data file that we
>>>>>>>>> currently call
>>>>>>>>> a "document". They could freely mix inside the same run. I have
>> also
>>>>>>>>> met
>>>>>>>>> use case when you have a single table (single "document"), and each
>>>>>>>>> record
>>>>>>>>> in it yields an output file. That can also be described in some
>> file
>>>>>>>>> format, or really in any other way, like directly in command line
>>>>>>>>> argument,
>>>>>>>>> via API, etc.
>>>>>>>>> - You have multiple documents without associated symbolical name in
>>>>>>>>> some
>>>>>>>>> examples. Templates can't identify those then in a well
>> maintainable
>>>>>>>>> way.
>>>>>>>>> The actual file name is often not a good identifier, can change
>> over
>>>>>>>>> time,
>>>>>>>>> and you might don't even have good control over it, like you
>> already
>>>>>>>>> receive it as a parameter from somewhere else, or someone
>>>>>>>>> moves/renames
>>>>>>>>> that files that you need to read. Index is also not very good, but
>> I
>>>>>>>>> have
>>>>>>>>> written about that earlier.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Wed, Feb 26, 2020 at 9:33 AM Siegfried Goeschl <
>>>>>>>>> siegfried.goeschl@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>> Hi folks,
>>>>>>>>> 
>>>>>>>>> still wrapping my side around but assembled some thoughts here -
>>>>>>>>> https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449
>>>>>>>>> 
>>>>>>>>> Thanks in advance,
>>>>>>>>> 
>>>>>>>>> Siegfried Goeschl
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 23 Feb 2020, at 23:14, Daniel Dekany <dd...@apache.org>
>> wrote:
>>>>>>>>> 
>>>>>>>>> What you are describing is more like the angle that FMPP took
>>>>>>>>> initially,
>>>>>>>>> where templates drive things, they generate the output for
>> themselves
>>>>>>>>> 
>>>>>>>>> (even
>>>>>>>>> 
>>>>>>>>> multiple output files if they wish). By default output files name
>>>>>>>>> (and
>>>>>>>>> relative path) is deduced from template name. There was also a
>> global
>>>>>>>>> data-model, built in a configuration file (or equally, built via
>>>>>>>>> command
>>>>>>>>> line arguments, or both mixed), from which templates get whatever
>>>>>>>>> data
>>>>>>>>> 
>>>>>>>>> they
>>>>>>>>> 
>>>>>>>>> are interested in. Take a look at the figures here:
>>>>>>>>> http://fmpp.sourceforge.net/qtour.html. Later, this concept was
>>>>>>>>> 
>>>>>>>>> generalized
>>>>>>>>> 
>>>>>>>>> a bit more, because you could add XML files at the same place where
>>>>>>>>> you
>>>>>>>>> have the templates, and then you could associate transform
>> templates
>>>>>>>>> to
>>>>>>>>> 
>>>>>>>>> the
>>>>>>>>> 
>>>>>>>>> XML files (based on path pattern and/or the XML document element).
>>>>>>>>> Now
>>>>>>>>> that's like what freemarker-generator had initially (data files
>> drive
>>>>>>>>> output, and the template is there to transform it).
>>>>>>>>> 
>>>>>>>>> So I think the generic mental model would like this:
>>>>>>>>> 
>>>>>>>>> 1. You got files that drive the process, let's call them *generator
>>>>>>>>> files* for now. Usually, each generator file yields an output file
>>>>>>>>> (but
>>>>>>>>> maybe even multiple output files, as you might saw in the last
>>>>>>>>> figure).
>>>>>>>>> These generator files can be of many types, like XML, JSON, XLSX
>> (as
>>>>>>>>> 
>>>>>>>>> in the
>>>>>>>>> 
>>>>>>>>> original freemarker-generator), and even templates (as is the norm
>> in
>>>>>>>>> FMPP). If the file is not a template, then you got a set of
>>>>>>>>> transformer
>>>>>>>>> templates (-t CLI option) in a separate directory, which can be
>>>>>>>>> 
>>>>>>>>> associated
>>>>>>>>> 
>>>>>>>>> with the generator files base on name patterns, and even based on
>>>>>>>>> 
>>>>>>>>> content
>>>>>>>>> 
>>>>>>>>> (schema usually). If the generator file is a template (so that's a
>>>>>>>>> positional @Parameter CLI argument that happens to be an *.ftl, and
>>>>>>>>> is
>>>>>>>>> 
>>>>>>>>> not
>>>>>>>>> 
>>>>>>>>> a template file specified after the "-t" option), then you just
>>>>>>>>> Template.process(...) it, and it prints what the output will be.
>>>>>>>>> 2. You also have a set of variables, the global data-model, that
>>>>>>>>> contains commonly useful stuff, like what you now call parameters
>>>>>>>>> (CLI
>>>>>>>>> -Pname=value), but also maybe data loaded from JSON, XML, etc..
>> Those
>>>>>>>>> 
>>>>>>>>> data
>>>>>>>>> 
>>>>>>>>> files aren't "generator files". Templates just use them if they
>> need
>>>>>>>>> 
>>>>>>>>> them.
>>>>>>>>> 
>>>>>>>>> An important thing here is to reuse the same mechanism to read and
>>>>>>>>> 
>>>>>>>>> parse
>>>>>>>>> 
>>>>>>>>> those data files, which was used in templates when transforming
>>>>>>>>> 
>>>>>>>>> generator
>>>>>>>>> 
>>>>>>>>> files. So we need a common format for specifying how to load data
>>>>>>>>> 
>>>>>>>>> files.
>>>>>>>>> 
>>>>>>>>> That's maybe just FTL that #assigns to the variables, or maybe more
>>>>>>>>> declarative format.
>>>>>>>>> 
>>>>>>>>> What I have described in the original post here was a less generic
>>>>>>>>> form
>>>>>>>>> 
>>>>>>>>> of
>>>>>>>>> 
>>>>>>>>> this, as I tried to be true with the original approach. I though
>> the
>>>>>>>>> proposal will be drastic enough as it is... :) There, the "main"
>>>>>>>>> document
>>>>>>>>> is the "generator file" from point 1, the "-t" template is the
>>>>>>>>> transform
>>>>>>>>> template for the "main" document, and the other named documents
>>>>>>>>> ("users",
>>>>>>>>> "groups") is a poor man's shared data-model from point 2 (together
>>>>>>>>> with
>>>>>>>>> with -PName=value).
>>>>>>>>> 
>>>>>>>>> There's further somewhat confusing thing to get right with the
>>>>>>>>> list-of-documents (`DocuentList`, `NamedDocumentLists`) thing
>> though.
>>>>>>>>> In
>>>>>>>>> the model above, as per point 1, if you list multiple data files,
>>>>>>>>> each
>>>>>>>>> 
>>>>>>>>> will
>>>>>>>>> 
>>>>>>>>> generate a separate output file. So, if you need take in a list of
>>>>>>>>> files
>>>>>>>>> 
>>>>>>>>> to
>>>>>>>>> 
>>>>>>>>> transform it to a single output file (or at least with a single
>>>>>>>>> transform
>>>>>>>>> template execution), then you have to be explicit about that, as
>>>>>>>>> that's
>>>>>>>>> 
>>>>>>>>> not
>>>>>>>>> 
>>>>>>>>> the default behavior anymore. But it's still absolutely possible.
>>>>>>>>> Imagine
>>>>>>>>> it as a "list of XLSX-es" is itself like a file format. You need
>> some
>>>>>>>>> CLI
>>>>>>>>> (and Maven config, etc.) syntax to express that, but that shouldn't
>>>>>>>>> be a
>>>>>>>>> big deal.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Sun, Feb 23, 2020 at 9:43 PM Siegfried Goeschl <
>>>>>>>>> siegfried.goeschl@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>> Hi Daniel,
>>>>>>>>> 
>>>>>>>>> Good timing - I was looking at a similar problem from different
>> angle
>>>>>>>>> yesterday (see below)
>>>>>>>>> 
>>>>>>>>> Don't have enough time to answer your email in detail now - will do
>>>>>>>>> that
>>>>>>>>> tomorrow evening
>>>>>>>>> 
>>>>>>>>> Thanks in advance,
>>>>>>>>> 
>>>>>>>>> Siegfried Goeschl
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> ===. START
>>>>>>>>> # FreeMarker CLI Improvement
>>>>>>>>> ## Support Of Multiple Template Files
>>>>>>>>> Currently we support the following combinations
>>>>>>>>> 
>>>>>>>>> * Single template and no data files
>>>>>>>>> * Single template and one or more data files
>>>>>>>>> 
>>>>>>>>> But we can not support the following use case which is quite
>> typical
>>>>>>>>> in
>>>>>>>>> the cloud
>>>>>>>>> 
>>>>>>>>> __Convert multiple templates with a single data file, e.g copying a
>>>>>>>>> directory of configuration files using a JSON configuration file__
>>>>>>>>> 
>>>>>>>>> ## Implementation notes
>>>>>>>>> * When we copy a directory we can remove the `ftl`extension on the
>>>>>>>>> fly
>>>>>>>>> * We might need an `exclude` filter for the copy operation
>>>>>>>>> * Initially resolve to a list of template files and process one
>> after
>>>>>>>>> another
>>>>>>>>> * Need to calculate the output file location and extension
>>>>>>>>> * We need to rename the existing command line parameters (see
>> below)
>>>>>>>>> * Do we need multiple include and exclude filter?
>>>>>>>>> * Do we need file versus directory filters?
>>>>>>>>> 
>>>>>>>>> ### Command Line Options
>>>>>>>>> ```
>>>>>>>>> --input-encoding : Encoding of the documents
>>>>>>>>> --output-encoding : Encoding of the rendered template
>>>>>>>>> --template-encoding : Encoding of the template
>>>>>>>>> --output : Output file or directory
>>>>>>>>> --include-document : Include pattern for documents
>>>>>>>>> --exclude-document : Exclude pattern for documents
>>>>>>>>> --include-template: Include pattern for templates
>>>>>>>>> --exclude-template : Exclude pattern for templates
>>>>>>>>> ```
>>>>>>>>> 
>>>>>>>>> ### Command Line Examples
>>>>>>>>> ```text
>>>>>>>>> # Copy all FTL templates found in "ext/config" to the "/config"
>>>>>>>>> 
>>>>>>>>> directory
>>>>>>>>> 
>>>>>>>>> using the data from "config.json"
>>>>>>>>> 
>>>>>>>>> freemarker-cli -t ./ext/config --include-template *.ftl --o /config
>>>>>>>>> 
>>>>>>>>> config.json
>>>>>>>>> 
>>>>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>>>>>>>>> 
>>>>>>>>> --output
>>>>>>>>> 
>>>>>>>>> /config config.json
>>>>>>>>> 
>>>>>>>>> # Bascically the same using a named document "configuration"
>>>>>>>>> # It might make sense to expose "conf" directly in the FreeMarker
>>>>>>>>> data
>>>>>>>>> model
>>>>>>>>> # It might make sens to allow URIs for loading documents
>>>>>>>>> 
>>>>>>>>> freemarker-cli -t ./ext/config/*.ftl -o /config -d
>>>>>>>>> 
>>>>>>>>> configuration=config.json
>>>>>>>>> 
>>>>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>>>>>>>>> 
>>>>>>>>> --output
>>>>>>>>> 
>>>>>>>>> /config --document configuration=config.json
>>>>>>>>> 
>>>>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>>>>>>>>> 
>>>>>>>>> --output
>>>>>>>>> 
>>>>>>>>> /config --document configuration=file:///config.json
>>>>>>>>> 
>>>>>>>>> # Bascically the same using an environment variable as named
>> document
>>>>>>>>> 
>>>>>>>>> freemarker-cli -t ./ext/config --include-template *.ftl -o /config
>> -d
>>>>>>>>> 
>>>>>>>>> configuration=env:///CONFIGURATION
>>>>>>>>> 
>>>>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>>>>>>>>> 
>>>>>>>>> --output
>>>>>>>>> 
>>>>>>>>> /config --document configuration=env:///CONFIGURATION
>>>>>>>>> ```
>>>>>>>>> === END
>>>>>>>>> 
>>>>>>>>> On 23.02.2020, at 16:37, Daniel Dekany <dd...@apache.org> wrote:
>>>>>>>>> 
>>>>>>>>> Input documents is a fundamental concept in freemarker-generator,
>> so
>>>>>>>>> we
>>>>>>>>> should think about that more, and probably refine/rework how it's
>>>>>>>>> done.
>>>>>>>>> 
>>>>>>>>> Currently it works like this, with CLI at least.
>>>>>>>>> 
>>>>>>>>> freemarker-cli
>>>>>>>>> -t access-report.ftl
>>>>>>>>> somewhere/foo-access-log.csv
>>>>>>>>> 
>>>>>>>>> Then in access-report.ftl you have to do something like this:
>>>>>>>>> 
>>>>>>>>> <#assign doc = Documents.get(0)>
>>>>>>>>> ... process doc here
>>>>>>>>> 
>>>>>>>>> (The more idiomatic Documents[0] won't work. Actually, that lead
>> to a
>>>>>>>>> 
>>>>>>>>> funny
>>>>>>>>> 
>>>>>>>>> chain of coincidences: It returned the string "D", then
>>>>>>>>> 
>>>>>>>>> CSVTool.parse(...)
>>>>>>>>> 
>>>>>>>>> happily parsed that to a table with the single column "D", and 0
>>>>>>>>> rows,
>>>>>>>>> 
>>>>>>>>> and
>>>>>>>>> 
>>>>>>>>> as there were 0 rows, the template didn't run into an error because
>>>>>>>>> row.myExpectedColumn refers to a missing column either, so the
>>>>>>>>> process
>>>>>>>>> finished with success. (: Pretty unlucky for sure. The root was
>>>>>>>>> unintentionally breaking a FreeMarker idiom though; eventually we
>>>>>>>>> will
>>>>>>>>> 
>>>>>>>>> have
>>>>>>>>> 
>>>>>>>>> to work on those too, but, different topic.)
>>>>>>>>> 
>>>>>>>>> However, actually multiple input documents can be passed in:
>>>>>>>>> 
>>>>>>>>> freemarker-cli
>>>>>>>>> -t access-report.ftl
>>>>>>>>> somewhere/foo-access-log.csv
>>>>>>>>> somewhere/bar-access-log.csv
>>>>>>>>> 
>>>>>>>>> Above template will still work, though then you ignored all but the
>>>>>>>>> 
>>>>>>>>> first
>>>>>>>>> 
>>>>>>>>> document. So if you expect any number of input documents, you
>>>>>>>>> probably
>>>>>>>>> 
>>>>>>>>> will
>>>>>>>>> 
>>>>>>>>> have to do this:
>>>>>>>>> 
>>>>>>>>> <#list Documents.list as doc>
>>>>>>>>> ... process doc here
>>>>>>>>> </#list>
>>>>>>>>> 
>>>>>>>>> (The more idiomatic <#list Documents as doc> won't work; but again,
>>>>>>>>> 
>>>>>>>>> those
>>>>>>>>> 
>>>>>>>>> we will work out in a different thread.)
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> So, what would be better, in my opinion. I start out from what I
>>>>>>>>> think
>>>>>>>>> 
>>>>>>>>> are
>>>>>>>>> 
>>>>>>>>> the common uses cases, in decreasing order of frequency. Goal is to
>>>>>>>>> 
>>>>>>>>> make
>>>>>>>>> 
>>>>>>>>> those less error prone for the users, and simpler to express.
>>>>>>>>> 
>>>>>>>>> USE CASE 1
>>>>>>>>> 
>>>>>>>>> You have exactly 1 input documents, which is therefore simply "the"
>>>>>>>>> document in the mind of the user. This is probably the typical use
>>>>>>>>> 
>>>>>>>>> case,
>>>>>>>>> 
>>>>>>>>> but at least the use case users typically start out from when
>>>>>>>>> starting
>>>>>>>>> 
>>>>>>>>> the
>>>>>>>>> 
>>>>>>>>> work.
>>>>>>>>> 
>>>>>>>>> freemarker-cli
>>>>>>>>> -t access-report.ftl
>>>>>>>>> somewhere/foo-access-log.csv
>>>>>>>>> 
>>>>>>>>> Then `Documents.get(0)` is not very fitting. Most importantly it's
>>>>>>>>> 
>>>>>>>>> error
>>>>>>>>> 
>>>>>>>>> prone, because if the user passed in more than 1 documents (can
>> even
>>>>>>>>> 
>>>>>>>>> happen
>>>>>>>>> 
>>>>>>>>> totally accidentally, like if the user was lazy and used a wildcard
>>>>>>>>> 
>>>>>>>>> that
>>>>>>>>> 
>>>>>>>>> the shell exploded), the template will silently ignore the rest of
>>>>>>>>> the
>>>>>>>>> documents, and the singe document processed will be practically
>>>>>>>>> picked
>>>>>>>>> randomly. The user might won't notice that and submits a bad report
>>>>>>>>> or
>>>>>>>>> 
>>>>>>>>> such.
>>>>>>>>> 
>>>>>>>>> I think that in this use case the document should be simply
>> referred
>>>>>>>>> as
>>>>>>>>> `Document` in the template. When you have multiple documents there,
>>>>>>>>> referring to `Document` should be an error, saying that the
>> template
>>>>>>>>> 
>>>>>>>>> was
>>>>>>>>> 
>>>>>>>>> made to process a single document only.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> USE CASE 2
>>>>>>>>> 
>>>>>>>>> You have multiple input documents, but each has different role
>>>>>>>>> 
>>>>>>>>> (different
>>>>>>>>> 
>>>>>>>>> schema, maybe different file type). Like, you pass in users.csv and
>>>>>>>>> groups.csv. Each has difference schema, and so you want to access
>>>>>>>>> them
>>>>>>>>> differently, but in the same template.
>>>>>>>>> 
>>>>>>>>> freemarker-cli
>>>>>>>>> [...]
>>>>>>>>> --named-document users somewhere/foo-users.csv
>>>>>>>>> --named-document groups somewhere/foo-groups.csv
>>>>>>>>> 
>>>>>>>>> Then in the template you could refer to them as:
>>>>>>>>> 
>>>>>>>>> `NamedDocuments.users`,
>>>>>>>>> 
>>>>>>>>> and `NamedDocuments.groups`.
>>>>>>>>> 
>>>>>>>>> Use Case 1, and 2 can be unified into a coherent concept, where
>>>>>>>>> 
>>>>>>>>> `Document`
>>>>>>>>> 
>>>>>>>>> is just a shorthand for `NamedDocuments.main`. It's called "main"
>>>>>>>>> 
>>>>>>>>> because
>>>>>>>>> 
>>>>>>>>> that's "the" document the template is about, but then you have to
>>>>>>>>> added
>>>>>>>>> some helper documents, with symbolic names representing their role.
>>>>>>>>> 
>>>>>>>>> freemarker-cli
>>>>>>>>> -t access-report.ftl
>>>>>>>>> --document-name=main somewhere/foo-access-log.csv
>>>>>>>>> --document-name=users somewhere/foo-users.csv
>>>>>>>>> --document-name=groups somewhere/foo-groups.csv
>>>>>>>>> 
>>>>>>>>> Here, `Document` still works in the template, and it refers to
>>>>>>>>> `somewhere/foo-access-log.csv`. (While omitting
>> --document-name=main
>>>>>>>>> 
>>>>>>>>> above
>>>>>>>>> 
>>>>>>>>> would be cleaner, I couldn't figure out how to do that with
>> Picocli.
>>>>>>>>> Anyway, for now the point is the concept, which is not specific to
>>>>>>>>> 
>>>>>>>>> CLI.)
>>>>>>>>> 
>>>>>>>>> USE CASE 3
>>>>>>>>> 
>>>>>>>>> Here you have several of the same kind of documents. That has a
>> more
>>>>>>>>> generic sub-use-case, when you have explicitly named documents
>> (like
>>>>>>>>> "users" above), and for some you expect multiple input files.
>>>>>>>>> 
>>>>>>>>> freemarker-cli
>>>>>>>>> -t access-report.ftl
>>>>>>>>> --document-name=main somewhere/foo-access-log.csv
>>>>>>>>> somewhere/bar-access-log.csv
>>>>>>>>> --document-name=users somewhere/foo-users.csv
>>>>>>>>> somewhere/bar-users.csv
>>>>>>>>> --document-name=groups somewhere/global-groups.csv
>>>>>>>>> 
>>>>>>>>> The template must to be written with this use case in mind, as now
>> it
>>>>>>>>> 
>>>>>>>>> has
>>>>>>>>> 
>>>>>>>>> #list some of the documents. (I think in practice you hardly ever
>>>>>>>>> want
>>>>>>>>> 
>>>>>>>>> to
>>>>>>>>> 
>>>>>>>>> get a document by hard coded index. Either you don't know how many
>>>>>>>>> documents you have, so you can't use hard coded indexes, or you do,
>>>>>>>>> and
>>>>>>>>> each index has a specific meaning, but then you should name the
>>>>>>>>> 
>>>>>>>>> documents
>>>>>>>>> 
>>>>>>>>> instead, as using indexes is error prone, and hard to read.)
>>>>>>>>> Accessing that list of documents in the template, maybe could be
>> done
>>>>>>>>> 
>>>>>>>>> like
>>>>>>>>> 
>>>>>>>>> this:
>>>>>>>>> - For the "main" documents: `DocumentList`
>>>>>>>>> - For explicitly named documents, like "users":
>>>>>>>>> 
>>>>>>>>> `NamedDocumentLists.users`
>>>>>>>>> 
>>>>>>>>> SUMMING UP
>>>>>>>>> 
>>>>>>>>> To unify all 3 use cases into a coherent concept:
>>>>>>>>> - `NamedDocumentLists.<name>` is the most generic form, and while
>> you
>>>>>>>>> 
>>>>>>>>> can
>>>>>>>>> 
>>>>>>>>> achieve everything with it, using it requires your template to
>> handle
>>>>>>>>> 
>>>>>>>>> the
>>>>>>>>> 
>>>>>>>>> most generic case too. So, I think it would be rarely used.
>>>>>>>>> - `DocumentList` is just a shorthand for `NamedDocumentLists.main`.
>>>>>>>>> 
>>>>>>>>> It's
>>>>>>>>> 
>>>>>>>>> used if you only have one kind of documents (single format and
>>>>>>>>> schema),
>>>>>>>>> 
>>>>>>>>> but
>>>>>>>>> 
>>>>>>>>> potentially multiple of them.
>>>>>>>>> - `NamedDocuments.<name>` expresses that you expect exactly 1
>>>>>>>>> document
>>>>>>>>> 
>>>>>>>>> of
>>>>>>>>> 
>>>>>>>>> the given name.
>>>>>>>>> - `Document` is just a shorthand for `NamedDocuments.main`. This is
>>>>>>>>> for
>>>>>>>>> 
>>>>>>>>> the
>>>>>>>>> 
>>>>>>>>> most natural/frequent use case.
>>>>>>>>> 
>>>>>>>>> That's 4 possible ways of accessing your documents, which is a
>>>>>>>>> 
>>>>>>>>> trade-off
>>>>>>>>> 
>>>>>>>>> for the sake of these:
>>>>>>>>> - Catching CLI (or Maven, etc.) input where the template output
>>>>>>>>> likely
>>>>>>>>> 
>>>>>>>>> will
>>>>>>>>> 
>>>>>>>>> be wrong. That's only possible if the user can communicate its
>> intent
>>>>>>>>> 
>>>>>>>>> in
>>>>>>>>> 
>>>>>>>>> the template.
>>>>>>>>> - Users don't need to deal with concepts that are irrelevant in
>> their
>>>>>>>>> concrete use case. Just start with the trivial, `Document`, and
>> later
>>>>>>>>> 
>>>>>>>>> if
>>>>>>>>> 
>>>>>>>>> the need arises, generalize to named documents, document lists, or
>>>>>>>>> 
>>>>>>>>> both.
>>>>>>>>> 
>>>>>>>>> What do guys think?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Best regards,
>>>>>> Daniel Dekany
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Best regards,
>>>>> Daniel Dekany
>>>> 
>>>> 
>>> 
>>> --
>>> Best regards,
>>> Daniel Dekany
>> 
>> 
> 
> -- 
> Best regards,
> Daniel Dekany

Re: freemarker-generator: Improving the input documents concept

Posted by Daniel Dekany <da...@gmail.com>.

>
>  I try to provide a useful name even when the content is coming from an
> URL


When is it recommended to rely on that though? Because utilizing that means
that renaming a data source file can break the process, even if you call
freemarker-cli with the up to date file name. And if that happens depends
on what you (or an other random colleague!) have dug inside the templates.
So I guess we better just don't support this. Less code and less things to
document too.


> I think we have a different understanding what a "Document" / "Datasource
> / DataSource" should do


Thing is, eventually (most certainly pre-1.0, as it influences
architecture), certain needs will have to addressed, somehow. Then we will
see what "things" we really need. For now I though we need "things" that
are much more than paths, and encapsulate the "how to load the data"
aspect. I called them data sources, but maybe we should called them "data
loaders" to free up data sources for the more primitive thing. Some
needs/doubts to address, *later*: Is it really the best approach for users
to load/parse data sources programmatically (that coded is written in FTL,
inside the templates)? Also, is the template the right place for doing
that, because, when multiple templates (or just multiple template *runs* of
the same template, each generating a different output file) needs common
data, they shouldn't load it again and again. Also, different topic, can we
handle the case "transparently" enough when the data is not coming from a
file?

The joy of programming - I did not intend to use "name:group" together with
> wildcards :-)


For a CLI tool, I guess we agree that it should work. So maybe, like this
(here logs and foos meant to be "groups"):
--data-source logs file1.log file2.log fileN.log   --data-source foos
foo1.csv foo2.csv fooN.csv  --data-source bar bar.xlsx

It so happens that here you don't really have a good control about the
number of files associated to the name, so, maybe yet another reason to not
differentiate names and groups.

I Disagree here - I think using a name would be used more often. I added
> the "group" as an afterthought since some grouping could be useful


We do agree in that. What I said is that the *syntax* should be so that the
group comes first. It's still optional. Like this:
--data-source group:name /somewhere
--data-source name /somewhere

On Sat, Feb 29, 2020 at 7:34 PM Siegfried Goeschl <
siegfried.goeschl@gmail.com> wrote:

> HI Daniel,
>
> Seem my comments below
>
> Thanks in advance,
>
> Siegfried Goeschl
>
>
> > On 29.02.2020, at 19:08, Daniel Dekany <da...@gmail.com> wrote:
> >
> > FREEMARKER-135 freemarker-generator-cli: Support user-supplied names for
> > datasources
> >
> > So, I can do this to have both a name an a group associated to a data
> > source:
> >  --datasource someName:someGroup=somewhere/something
>
> Correct
>
> > Or if I only want a name, but not a group (or an ""  group actually -
> > bug?), then:
> >  --datasource someName=somewhere/something
>
> Correct
>
> >
> > Or if only a group but not a name (or a "" name actually) then:
> >  --datasource :someGroup=somewhere/something
>
> Mhmm, that would be unintended functionality from my side - current
> approach is that every "Document" / "Datasource / DataSource" is named
>
> >
> > A name must identify exactly 1 data source, while a group identifies a
> list
> > of data sources.
>
> No, every "Document" / "Datasource / DataSource" has a name currently but
> uniqueness is not enforced. Only if you want to get a "Document" /
> "Datasource / DataSource" with it's exact name I checked for exactly one
> search hit and throw an exception. I try to provide a useful name even when
> the content is coming from an URL or STDIN (and I will probably add
> environment variables as "Document" / "Datasource / DataSource", e.g
> configuration in the cloud as JSON content passed as environment variable)
>
> >
> > Is that this idea, that the a data source can be part of a group, and
> then
> > is also possibly identifiable with a name comes from an use case? I mean,
> > it's possibly important somewhere, but if so, then it's strange that you
> > can put something into only a single group. If we need this kind of
> thing,
> > then perhaps you should be just allowed to associate the data source
> with a
> > list of names (kind of like tagging), and then when the template wants to
> > get something by name, it will tell there if it expects exactly one or a
> > list of data sources. Then you don't need to introduce two terms in the
> > documentation either (names and groups). Again, if we want this at all,
> > instead of just going with a data source that itself gives a list. (And
> if
> > not, how will we handle a data source that loads from a non-file source?)
>
> I actually thought of implementing tagging but considered a "group"
> sufficient.
>
> * If you don't define anything everything goes into the "default" group
> * For individual documents you can define a name and an optional group
>
> I think we have a different understanding what a "Document" / "Datasource
> / DataSource" should do
>
> * It is a dumb
> * It is lazy since data is only loaded on demand
> * There is no automagic like "oh, this is a JSON file, so let's go to the
> JSON tool and create a map readily accessible in the data model"
>
> >
> > Note that the current command line syntax doesn't work well with shell
> > wildcard expansion. Like this:
> > --datasource :someGroup=logs/*.log
> > will try to expand ":someGroup=logs/*.log", and because it finds nothing
> > (and because the rules of sh and the like is a mess), you will get the
> > parameter value as is, without * expanded.
>
> The joy of programming - I did not intend to use "name:group" together
> with wildcards :-)
>
> >
> > Also,  I think the syntax with colon should be flipped, because on other
> > places foo:bar usually means that foo is the bigger unit (the container),
> > and bar is the smaller unit (the child).
>
> I Disagree here - I think using a name would be used more often. I added
> the "group" as an afterthought since some grouping could be useful
>
> >
> > On Sat, Feb 29, 2020 at 5:03 PM Siegfried Goeschl <
> > siegfried.goeschl@gmail.com> wrote:
> >
> >> Hi Daniel,
> >>
> >> I'm an enterprise developer - bad habits die hard :-)
> >>
> >> So I closed the following tickets and merged the branches
> >>
> >> 1) FREEMARKER-129 freemarker-generator: Merge "freemarker-cli" into
> >> "freemarker-generator"
> >> 2) FREEMARKER-134 freemarker-generator: Rename "Document" to
> "Datasource"
> >> 3) FREEMARKER-135 freemarker-generator-cli: Support user-supplied names
> >> for datasources
> >>
> >> Thanks in advance,
> >>
> >> Siegfried Goeschl
> >>
> >>
> >>> On 29.02.2020, at 12:19, Daniel Dekany <da...@gmail.com>
> wrote:
> >>>
> >>> Yeah, and of course, you can merge that branch. You can even work on
> the
> >>> master directly after all.
> >>>
> >>> On Sat, Feb 29, 2020 at 12:17 PM Daniel Dekany <
> daniel.dekany@gmail.com>
> >>> wrote:
> >>>
> >>>> But, I do recognize the cattle use case (several "faceless" files with
> >>>> common format/schema). Only, my idea is to push that complexity on the
> >> data
> >>>> source. The "data source" concept shields the rest of the application
> >> from
> >>>> the details of how the data is stored or retrieved. So, a data source
> >> might
> >>>> loads a bunch of log files from a directory, and present them as a
> >> single
> >>>> big table, or like a list of tables, etc. So I want to deal with the
> >> cattle
> >>>> use case, but the question is what part of the of architecture will
> deal
> >>>> with this complication, with other words, how do you box things. Why
> my
> >>>> initial bet is to stuff that complication into the "data source"
> >>>> implementation(s) is that data sources are inherently varied. Some
> >> returns
> >>>> a table-like thing, some have multiple named tables (worksheets in
> >> Excel),
> >>>> some returns tree of nodes (XML), etc. So then, some might returns a
> >>>> list-of-list-of log records, or just a single list of log-records (put
> >>>> together from daily log files). That way cattles don't add to
> conceptual
> >>>> complexity. Now, you might be aware of cases where the cattle concept
> >> must
> >>>> be more exposed than this, and the we can't box things like this. But
> >> this
> >>>> is what I tried to express.
> >>>>
> >>>> Regarding "output generators", and how that applies on the command
> >> line. I
> >>>> think it's important that the common core between Maven and
> >> command-line is
> >>>> as fat as possible. Ideally, they are just two syntax to set up the
> same
> >>>> thing. Mostly at least. So, if you specify a template file to the CLI
> >>>> application, in a way so that it causes it to process that template to
> >>>> generate a single output, then there you have just defined an "output
> >>>> generator" (even if it wasn't explicitly called like that in the
> command
> >>>> line). If you specify 3 csv files to the CLI application, in a way so
> >> that
> >>>> it causes it to generate 3 output files, then you have just defined 3
> >>>> "output generators" there (there's at least one template specified
> there
> >>>> too, but that wasn't an "output generator" itself, it was just an
> >> attribute
> >>>> of the 3 output generators). If you specify 1 template, and 3 csv
> >> files, in
> >>>> a way so that it will yield 4 output files (1 for the template, 3 for
> >> the
> >>>> csv-s), then you have defined 4 output generators there. If you have a
> >> data
> >>>> source that loads a list of 3 entities (say, 3 csv files, so it's a
> >> list of
> >>>> tables then), and you have 2 templates, and you tell the CLI to
> execute
> >>>> each template for each item in said data source, then you have just
> >> defined
> >>>> 6 "output generators".
> >>>>
> >>>> On Fri, Feb 28, 2020 at 11:08 AM Siegfried Goeschl <
> >>>> siegfried.goeschl@gmail.com> wrote:
> >>>>
> >>>>> Hi Daniel,
> >>>>>
> >>>>> That all depends on your mental model and work you do, expectations,
> >>>>> experience :-)
> >>>>>
> >>>>>
> >>>>> __Document Handling__
> >>>>>
> >>>>> *"But I think actually we have no good use case for list of documents
> >>>>> that's passed at once to a single template run, so, we can just
> ignore
> >>>>> that complication"*
> >>>>>
> >>>>> In my case that's not a complication but my daily business - I'm
> >>>>> regularly wading through access logs - yesterday probably a couple of
> >>>>> hundreds access logs across two staging sites to help tracking some
> >>>>> strange API gateway issues :-)
> >>>>>
> >>>>> My gut feeling is (borrowing from
> >>>>>
> >>>>>
> >>
> https://medium.com/@Joachim8675309/devops-concepts-pets-vs-cattle-2380b5aab313
> >>>>> )
> >>>>>
> >>>>> 1. You have a few lovely named documents / templates - `pets`
> >>>>> 2. You have tons of anonymous documents / templates to process -
> >>>>> `cattle`
> >>>>> 3. The "grey area" comes into play when mixing `pets & cattle`
> >>>>>
> >>>>> `freemarker-cli` was built with 2) in mind and I want to cover 1)
> since
> >>>>> it is equally important and common.
> >>>>>
> >>>>>
> >>>>> __Template And Document Processing Modes__
> >>>>>
> >>>>> IMHO it is important to answer the following question : "How many
> >>>>> outputs do you get when rendering 2 template and 3 datasources? Two,
> >>>>> Three or Six?"
> >>>>>
> >>>>> Your answer is influenced by your mental model / experience
> >>>>>
> >>>>> * When wading through tons of CSV files, access logs, etc. the answer
> >> is
> >>>>> "2"
> >>>>> * When doing source code generation the obvious answer is "6"
> >>>>> * Can't image a use case which results in "3" but I'm pretty sure we
> >>>>> will encounter one
> >>>>>
> >>>>> __Template and document mode probably shouldn't exist__
> >>>>>
> >>>>> That's hard for me to fully understand - I definitely lack your
> >> insights
> >>>>> & experience writing such tools :-)
> >>>>>
> >>>>> Defining the `Output Generator` is the underlying model for the Maven
> >>>>> plugin (and probably FMPP).
> >>>>>
> >>>>> I'm not sure if this applies for command lines at least not in the
> way
> >> I
> >>>>> use them (or would like to use them)
> >>>>>
> >>>>>
> >>>>> Thanks in advance,
> >>>>>
> >>>>> Siegfried Goeschl
> >>>>>
> >>>>> PS: Can/shall I merge the PR to bring in `freemarker-cli`?
> >>>>>
> >>>>>
> >>>>> On 28 Feb 2020, at 9:14, Daniel Dekany wrote:
> >>>>>
> >>>>>> Yeah, "data source" is surely a too popular name, but for reason.
> >>>>>> Anyone
> >>>>>> has other ideas?
> >>>>>>
> >>>>>> As of naming data sources and such. One thing I was wondering about
> >>>>>> back
> >>>>>> then is how to deal with list of documents given to a template,
> versus
> >>>>>> exactly 1 document given to a template. But I think actually we have
> >>>>>> no
> >>>>>> good use case for list of documents that's passed at once to a
> single
> >>>>>> template run, so, we can just ignore that complication. A document
> has
> >>>>>> a
> >>>>>> name, and that's always just a single document, not a collection, as
> >>>>>> far as
> >>>>>> the template is concerned. (We can have multiple documents per run,
> >>>>>> but
> >>>>>> those normally yield separate output generators, so it's still only
> >>>>>> one
> >>>>>> document per template.) However, we can have data source types
> >>>>>> (document
> >>>>>> types with old terminology) that collect together multiple data
> files.
> >>>>>> So
> >>>>>> then that complexity is encapsulated into the data source type, and
> >>>>>> doesn't
> >>>>>> complicate the overall architecture. That's another case when a data
> >>>>>> source
> >>>>>> is not just a file. Like maybe there's a data source type that loads
> >>>>>> all
> >>>>>> the CSV-s from a directory, into a single big table (I had such
> case),
> >>>>>> or
> >>>>>> even into a list of tables. Or, as I mentioned already, a data
> source
> >>>>>> is
> >>>>>> maybe an SQL query on a JDBC data source (and we got the first term
> >>>>>> clash... JDBC also call them data sources).
> >>>>>>
> >>>>>> Template and document mode probably shouldn't exist from user
> >>>>>> perspective
> >>>>>> either, at least not as a global option that must apply to
> everything
> >>>>>> in a
> >>>>>> run. They could just give the files that define the "output
> >>>>>> generators",
> >>>>>> and some of them will be templates, some of them are data files, in
> >>>>>> which
> >>>>>> case a template need to be associated with them (and there can be a
> >>>>>> couple
> >>>>>> of ways of doing that). And then again, there are the cases where
> you
> >>>>>> want
> >>>>>> to create one output generator per entity from some data source.
> >>>>>>
> >>>>>> On Fri, Feb 28, 2020 at 8:23 AM Siegfried Goeschl <
> >>>>>> siegfried.goeschl@gmail.com> wrote:
> >>>>>>
> >>>>>>> Hi Daniel,
> >>>>>>>
> >>>>>>> See my comments below - and thanks for your patience and input :-)
> >>>>>>>
> >>>>>>> *Renaming Document To DataSource*
> >>>>>>>
> >>>>>>> Yes, makes sense. I tried to avoid since I'm using javax.activation
> >>>>>>> and
> >>>>>>> its DataSource.
> >>>>>>>
> >>>>>>> *Template And Document Mode*
> >>>>>>>
> >>>>>>> Agreed - I think it is a valuable abstraction for the user but it
> is
> >>>>>>> not
> >>>>>>> an implementation concept :-)
> >>>>>>>
> >>>>>>> *Document Without Symbolic Names*
> >>>>>>>
> >>>>>>> Also agreed and it is going to change but I have not settled my
> mind
> >>>>>>> yet
> >>>>>>> what exactly to implement.
> >>>>>>>
> >>>>>>> Thanks in advance,
> >>>>>>>
> >>>>>>> Siegfried Goeschl
> >>>>>>>
> >>>>>>> On 28 Feb 2020, at 1:05, Daniel Dekany wrote:
> >>>>>>>
> >>>>>>> A few quick thoughts on that:
> >>>>>>>
> >>>>>>> - We should replace the "document" term with something more
> speaking.
> >>>>>>> It
> >>>>>>> doesn't tell that it's some kind of input. Also, most of these
> inputs
> >>>>>>> aren't something that people typically call documents. Like a csv
> >>>>>>> file, or
> >>>>>>> a database table, which is not even a file (OK we don't support
> such
> >>>>>>> thing
> >>>>>>> at the moment). I think, maybe "data source" is a safe enough term.
> >>>>>>> (It
> >>>>>>> also rhymes with data model.)
> >>>>>>> - You have separate "template" and "document" "mode", that applies
> to
> >>>>>>> a
> >>>>>>> whole run. I think such specialization won't be helpful. We could
> >>>>>>> just say,
> >>>>>>> on the conceptual level at lest, that we need a set of "outputs
> >>>>>>> generators". An output generator is an object (in the API) that
> >>>>>>> specifies a
> >>>>>>> template, a data-model (where the data-model is possibly populated
> >>>>>>> with
> >>>>>>> "documents"), and an output "sink" (a file path, or stdout), and
> can
> >>>>>>> generate the output itself. A practical way of defining the output
> >>>>>>> generators in a CLI application is via a bunch of files, each
> >>>>>>> defining an
> >>>>>>> output generator. Some of those files is maybe a template (that you
> >>>>>>> can
> >>>>>>> even detect from the file extension), or a data file that we
> >>>>>>> currently call
> >>>>>>> a "document". They could freely mix inside the same run. I have
> also
> >>>>>>> met
> >>>>>>> use case when you have a single table (single "document"), and each
> >>>>>>> record
> >>>>>>> in it yields an output file. That can also be described in some
> file
> >>>>>>> format, or really in any other way, like directly in command line
> >>>>>>> argument,
> >>>>>>> via API, etc.
> >>>>>>> - You have multiple documents without associated symbolical name in
> >>>>>>> some
> >>>>>>> examples. Templates can't identify those then in a well
> maintainable
> >>>>>>> way.
> >>>>>>> The actual file name is often not a good identifier, can change
> over
> >>>>>>> time,
> >>>>>>> and you might don't even have good control over it, like you
> already
> >>>>>>> receive it as a parameter from somewhere else, or someone
> >>>>>>> moves/renames
> >>>>>>> that files that you need to read. Index is also not very good, but
> I
> >>>>>>> have
> >>>>>>> written about that earlier.
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, Feb 26, 2020 at 9:33 AM Siegfried Goeschl <
> >>>>>>> siegfried.goeschl@gmail.com> wrote:
> >>>>>>>
> >>>>>>> Hi folks,
> >>>>>>>
> >>>>>>> still wrapping my side around but assembled some thoughts here -
> >>>>>>> https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449
> >>>>>>>
> >>>>>>> Thanks in advance,
> >>>>>>>
> >>>>>>> Siegfried Goeschl
> >>>>>>>
> >>>>>>>
> >>>>>>> On 23 Feb 2020, at 23:14, Daniel Dekany <dd...@apache.org>
> wrote:
> >>>>>>>
> >>>>>>> What you are describing is more like the angle that FMPP took
> >>>>>>> initially,
> >>>>>>> where templates drive things, they generate the output for
> themselves
> >>>>>>>
> >>>>>>> (even
> >>>>>>>
> >>>>>>> multiple output files if they wish). By default output files name
> >>>>>>> (and
> >>>>>>> relative path) is deduced from template name. There was also a
> global
> >>>>>>> data-model, built in a configuration file (or equally, built via
> >>>>>>> command
> >>>>>>> line arguments, or both mixed), from which templates get whatever
> >>>>>>> data
> >>>>>>>
> >>>>>>> they
> >>>>>>>
> >>>>>>> are interested in. Take a look at the figures here:
> >>>>>>> http://fmpp.sourceforge.net/qtour.html. Later, this concept was
> >>>>>>>
> >>>>>>> generalized
> >>>>>>>
> >>>>>>> a bit more, because you could add XML files at the same place where
> >>>>>>> you
> >>>>>>> have the templates, and then you could associate transform
> templates
> >>>>>>> to
> >>>>>>>
> >>>>>>> the
> >>>>>>>
> >>>>>>> XML files (based on path pattern and/or the XML document element).
> >>>>>>> Now
> >>>>>>> that's like what freemarker-generator had initially (data files
> drive
> >>>>>>> output, and the template is there to transform it).
> >>>>>>>
> >>>>>>> So I think the generic mental model would like this:
> >>>>>>>
> >>>>>>> 1. You got files that drive the process, let's call them *generator
> >>>>>>> files* for now. Usually, each generator file yields an output file
> >>>>>>> (but
> >>>>>>> maybe even multiple output files, as you might saw in the last
> >>>>>>> figure).
> >>>>>>> These generator files can be of many types, like XML, JSON, XLSX
> (as
> >>>>>>>
> >>>>>>> in the
> >>>>>>>
> >>>>>>> original freemarker-generator), and even templates (as is the norm
> in
> >>>>>>> FMPP). If the file is not a template, then you got a set of
> >>>>>>> transformer
> >>>>>>> templates (-t CLI option) in a separate directory, which can be
> >>>>>>>
> >>>>>>> associated
> >>>>>>>
> >>>>>>> with the generator files base on name patterns, and even based on
> >>>>>>>
> >>>>>>> content
> >>>>>>>
> >>>>>>> (schema usually). If the generator file is a template (so that's a
> >>>>>>> positional @Parameter CLI argument that happens to be an *.ftl, and
> >>>>>>> is
> >>>>>>>
> >>>>>>> not
> >>>>>>>
> >>>>>>> a template file specified after the "-t" option), then you just
> >>>>>>> Template.process(...) it, and it prints what the output will be.
> >>>>>>> 2. You also have a set of variables, the global data-model, that
> >>>>>>> contains commonly useful stuff, like what you now call parameters
> >>>>>>> (CLI
> >>>>>>> -Pname=value), but also maybe data loaded from JSON, XML, etc..
> Those
> >>>>>>>
> >>>>>>> data
> >>>>>>>
> >>>>>>> files aren't "generator files". Templates just use them if they
> need
> >>>>>>>
> >>>>>>> them.
> >>>>>>>
> >>>>>>> An important thing here is to reuse the same mechanism to read and
> >>>>>>>
> >>>>>>> parse
> >>>>>>>
> >>>>>>> those data files, which was used in templates when transforming
> >>>>>>>
> >>>>>>> generator
> >>>>>>>
> >>>>>>> files. So we need a common format for specifying how to load data
> >>>>>>>
> >>>>>>> files.
> >>>>>>>
> >>>>>>> That's maybe just FTL that #assigns to the variables, or maybe more
> >>>>>>> declarative format.
> >>>>>>>
> >>>>>>> What I have described in the original post here was a less generic
> >>>>>>> form
> >>>>>>>
> >>>>>>> of
> >>>>>>>
> >>>>>>> this, as I tried to be true with the original approach. I though
> the
> >>>>>>> proposal will be drastic enough as it is... :) There, the "main"
> >>>>>>> document
> >>>>>>> is the "generator file" from point 1, the "-t" template is the
> >>>>>>> transform
> >>>>>>> template for the "main" document, and the other named documents
> >>>>>>> ("users",
> >>>>>>> "groups") is a poor man's shared data-model from point 2 (together
> >>>>>>> with
> >>>>>>> with -PName=value).
> >>>>>>>
> >>>>>>> There's further somewhat confusing thing to get right with the
> >>>>>>> list-of-documents (`DocuentList`, `NamedDocumentLists`) thing
> though.
> >>>>>>> In
> >>>>>>> the model above, as per point 1, if you list multiple data files,
> >>>>>>> each
> >>>>>>>
> >>>>>>> will
> >>>>>>>
> >>>>>>> generate a separate output file. So, if you need take in a list of
> >>>>>>> files
> >>>>>>>
> >>>>>>> to
> >>>>>>>
> >>>>>>> transform it to a single output file (or at least with a single
> >>>>>>> transform
> >>>>>>> template execution), then you have to be explicit about that, as
> >>>>>>> that's
> >>>>>>>
> >>>>>>> not
> >>>>>>>
> >>>>>>> the default behavior anymore. But it's still absolutely possible.
> >>>>>>> Imagine
> >>>>>>> it as a "list of XLSX-es" is itself like a file format. You need
> some
> >>>>>>> CLI
> >>>>>>> (and Maven config, etc.) syntax to express that, but that shouldn't
> >>>>>>> be a
> >>>>>>> big deal.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Sun, Feb 23, 2020 at 9:43 PM Siegfried Goeschl <
> >>>>>>> siegfried.goeschl@gmail.com> wrote:
> >>>>>>>
> >>>>>>> Hi Daniel,
> >>>>>>>
> >>>>>>> Good timing - I was looking at a similar problem from different
> angle
> >>>>>>> yesterday (see below)
> >>>>>>>
> >>>>>>> Don't have enough time to answer your email in detail now - will do
> >>>>>>> that
> >>>>>>> tomorrow evening
> >>>>>>>
> >>>>>>> Thanks in advance,
> >>>>>>>
> >>>>>>> Siegfried Goeschl
> >>>>>>>
> >>>>>>>
> >>>>>>> ===. START
> >>>>>>> # FreeMarker CLI Improvement
> >>>>>>> ## Support Of Multiple Template Files
> >>>>>>> Currently we support the following combinations
> >>>>>>>
> >>>>>>> * Single template and no data files
> >>>>>>> * Single template and one or more data files
> >>>>>>>
> >>>>>>> But we can not support the following use case which is quite
> typical
> >>>>>>> in
> >>>>>>> the cloud
> >>>>>>>
> >>>>>>> __Convert multiple templates with a single data file, e.g copying a
> >>>>>>> directory of configuration files using a JSON configuration file__
> >>>>>>>
> >>>>>>> ## Implementation notes
> >>>>>>> * When we copy a directory we can remove the `ftl`extension on the
> >>>>>>> fly
> >>>>>>> * We might need an `exclude` filter for the copy operation
> >>>>>>> * Initially resolve to a list of template files and process one
> after
> >>>>>>> another
> >>>>>>> * Need to calculate the output file location and extension
> >>>>>>> * We need to rename the existing command line parameters (see
> below)
> >>>>>>> * Do we need multiple include and exclude filter?
> >>>>>>> * Do we need file versus directory filters?
> >>>>>>>
> >>>>>>> ### Command Line Options
> >>>>>>> ```
> >>>>>>> --input-encoding : Encoding of the documents
> >>>>>>> --output-encoding : Encoding of the rendered template
> >>>>>>> --template-encoding : Encoding of the template
> >>>>>>> --output : Output file or directory
> >>>>>>> --include-document : Include pattern for documents
> >>>>>>> --exclude-document : Exclude pattern for documents
> >>>>>>> --include-template: Include pattern for templates
> >>>>>>> --exclude-template : Exclude pattern for templates
> >>>>>>> ```
> >>>>>>>
> >>>>>>> ### Command Line Examples
> >>>>>>> ```text
> >>>>>>> # Copy all FTL templates found in "ext/config" to the "/config"
> >>>>>>>
> >>>>>>> directory
> >>>>>>>
> >>>>>>> using the data from "config.json"
> >>>>>>>
> >>>>>>> freemarker-cli -t ./ext/config --include-template *.ftl --o /config
> >>>>>>>
> >>>>>>> config.json
> >>>>>>>
> >>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
> >>>>>>>
> >>>>>>> --output
> >>>>>>>
> >>>>>>> /config config.json
> >>>>>>>
> >>>>>>> # Bascically the same using a named document "configuration"
> >>>>>>> # It might make sense to expose "conf" directly in the FreeMarker
> >>>>>>> data
> >>>>>>> model
> >>>>>>> # It might make sens to allow URIs for loading documents
> >>>>>>>
> >>>>>>> freemarker-cli -t ./ext/config/*.ftl -o /config -d
> >>>>>>>
> >>>>>>> configuration=config.json
> >>>>>>>
> >>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
> >>>>>>>
> >>>>>>> --output
> >>>>>>>
> >>>>>>> /config --document configuration=config.json
> >>>>>>>
> >>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
> >>>>>>>
> >>>>>>> --output
> >>>>>>>
> >>>>>>> /config --document configuration=file:///config.json
> >>>>>>>
> >>>>>>> # Bascically the same using an environment variable as named
> document
> >>>>>>>
> >>>>>>> freemarker-cli -t ./ext/config --include-template *.ftl -o /config
> -d
> >>>>>>>
> >>>>>>> configuration=env:///CONFIGURATION
> >>>>>>>
> >>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
> >>>>>>>
> >>>>>>> --output
> >>>>>>>
> >>>>>>> /config --document configuration=env:///CONFIGURATION
> >>>>>>> ```
> >>>>>>> === END
> >>>>>>>
> >>>>>>> On 23.02.2020, at 16:37, Daniel Dekany <dd...@apache.org> wrote:
> >>>>>>>
> >>>>>>> Input documents is a fundamental concept in freemarker-generator,
> so
> >>>>>>> we
> >>>>>>> should think about that more, and probably refine/rework how it's
> >>>>>>> done.
> >>>>>>>
> >>>>>>> Currently it works like this, with CLI at least.
> >>>>>>>
> >>>>>>> freemarker-cli
> >>>>>>> -t access-report.ftl
> >>>>>>> somewhere/foo-access-log.csv
> >>>>>>>
> >>>>>>> Then in access-report.ftl you have to do something like this:
> >>>>>>>
> >>>>>>> <#assign doc = Documents.get(0)>
> >>>>>>> ... process doc here
> >>>>>>>
> >>>>>>> (The more idiomatic Documents[0] won't work. Actually, that lead
> to a
> >>>>>>>
> >>>>>>> funny
> >>>>>>>
> >>>>>>> chain of coincidences: It returned the string "D", then
> >>>>>>>
> >>>>>>> CSVTool.parse(...)
> >>>>>>>
> >>>>>>> happily parsed that to a table with the single column "D", and 0
> >>>>>>> rows,
> >>>>>>>
> >>>>>>> and
> >>>>>>>
> >>>>>>> as there were 0 rows, the template didn't run into an error because
> >>>>>>> row.myExpectedColumn refers to a missing column either, so the
> >>>>>>> process
> >>>>>>> finished with success. (: Pretty unlucky for sure. The root was
> >>>>>>> unintentionally breaking a FreeMarker idiom though; eventually we
> >>>>>>> will
> >>>>>>>
> >>>>>>> have
> >>>>>>>
> >>>>>>> to work on those too, but, different topic.)
> >>>>>>>
> >>>>>>> However, actually multiple input documents can be passed in:
> >>>>>>>
> >>>>>>> freemarker-cli
> >>>>>>> -t access-report.ftl
> >>>>>>> somewhere/foo-access-log.csv
> >>>>>>> somewhere/bar-access-log.csv
> >>>>>>>
> >>>>>>> Above template will still work, though then you ignored all but the
> >>>>>>>
> >>>>>>> first
> >>>>>>>
> >>>>>>> document. So if you expect any number of input documents, you
> >>>>>>> probably
> >>>>>>>
> >>>>>>> will
> >>>>>>>
> >>>>>>> have to do this:
> >>>>>>>
> >>>>>>> <#list Documents.list as doc>
> >>>>>>> ... process doc here
> >>>>>>> </#list>
> >>>>>>>
> >>>>>>> (The more idiomatic <#list Documents as doc> won't work; but again,
> >>>>>>>
> >>>>>>> those
> >>>>>>>
> >>>>>>> we will work out in a different thread.)
> >>>>>>>
> >>>>>>>
> >>>>>>> So, what would be better, in my opinion. I start out from what I
> >>>>>>> think
> >>>>>>>
> >>>>>>> are
> >>>>>>>
> >>>>>>> the common uses cases, in decreasing order of frequency. Goal is to
> >>>>>>>
> >>>>>>> make
> >>>>>>>
> >>>>>>> those less error prone for the users, and simpler to express.
> >>>>>>>
> >>>>>>> USE CASE 1
> >>>>>>>
> >>>>>>> You have exactly 1 input documents, which is therefore simply "the"
> >>>>>>> document in the mind of the user. This is probably the typical use
> >>>>>>>
> >>>>>>> case,
> >>>>>>>
> >>>>>>> but at least the use case users typically start out from when
> >>>>>>> starting
> >>>>>>>
> >>>>>>> the
> >>>>>>>
> >>>>>>> work.
> >>>>>>>
> >>>>>>> freemarker-cli
> >>>>>>> -t access-report.ftl
> >>>>>>> somewhere/foo-access-log.csv
> >>>>>>>
> >>>>>>> Then `Documents.get(0)` is not very fitting. Most importantly it's
> >>>>>>>
> >>>>>>> error
> >>>>>>>
> >>>>>>> prone, because if the user passed in more than 1 documents (can
> even
> >>>>>>>
> >>>>>>> happen
> >>>>>>>
> >>>>>>> totally accidentally, like if the user was lazy and used a wildcard
> >>>>>>>
> >>>>>>> that
> >>>>>>>
> >>>>>>> the shell exploded), the template will silently ignore the rest of
> >>>>>>> the
> >>>>>>> documents, and the singe document processed will be practically
> >>>>>>> picked
> >>>>>>> randomly. The user might won't notice that and submits a bad report
> >>>>>>> or
> >>>>>>>
> >>>>>>> such.
> >>>>>>>
> >>>>>>> I think that in this use case the document should be simply
> referred
> >>>>>>> as
> >>>>>>> `Document` in the template. When you have multiple documents there,
> >>>>>>> referring to `Document` should be an error, saying that the
> template
> >>>>>>>
> >>>>>>> was
> >>>>>>>
> >>>>>>> made to process a single document only.
> >>>>>>>
> >>>>>>>
> >>>>>>> USE CASE 2
> >>>>>>>
> >>>>>>> You have multiple input documents, but each has different role
> >>>>>>>
> >>>>>>> (different
> >>>>>>>
> >>>>>>> schema, maybe different file type). Like, you pass in users.csv and
> >>>>>>> groups.csv. Each has difference schema, and so you want to access
> >>>>>>> them
> >>>>>>> differently, but in the same template.
> >>>>>>>
> >>>>>>> freemarker-cli
> >>>>>>> [...]
> >>>>>>> --named-document users somewhere/foo-users.csv
> >>>>>>> --named-document groups somewhere/foo-groups.csv
> >>>>>>>
> >>>>>>> Then in the template you could refer to them as:
> >>>>>>>
> >>>>>>> `NamedDocuments.users`,
> >>>>>>>
> >>>>>>> and `NamedDocuments.groups`.
> >>>>>>>
> >>>>>>> Use Case 1, and 2 can be unified into a coherent concept, where
> >>>>>>>
> >>>>>>> `Document`
> >>>>>>>
> >>>>>>> is just a shorthand for `NamedDocuments.main`. It's called "main"
> >>>>>>>
> >>>>>>> because
> >>>>>>>
> >>>>>>> that's "the" document the template is about, but then you have to
> >>>>>>> added
> >>>>>>> some helper documents, with symbolic names representing their role.
> >>>>>>>
> >>>>>>> freemarker-cli
> >>>>>>> -t access-report.ftl
> >>>>>>> --document-name=main somewhere/foo-access-log.csv
> >>>>>>> --document-name=users somewhere/foo-users.csv
> >>>>>>> --document-name=groups somewhere/foo-groups.csv
> >>>>>>>
> >>>>>>> Here, `Document` still works in the template, and it refers to
> >>>>>>> `somewhere/foo-access-log.csv`. (While omitting
> --document-name=main
> >>>>>>>
> >>>>>>> above
> >>>>>>>
> >>>>>>> would be cleaner, I couldn't figure out how to do that with
> Picocli.
> >>>>>>> Anyway, for now the point is the concept, which is not specific to
> >>>>>>>
> >>>>>>> CLI.)
> >>>>>>>
> >>>>>>> USE CASE 3
> >>>>>>>
> >>>>>>> Here you have several of the same kind of documents. That has a
> more
> >>>>>>> generic sub-use-case, when you have explicitly named documents
> (like
> >>>>>>> "users" above), and for some you expect multiple input files.
> >>>>>>>
> >>>>>>> freemarker-cli
> >>>>>>> -t access-report.ftl
> >>>>>>> --document-name=main somewhere/foo-access-log.csv
> >>>>>>> somewhere/bar-access-log.csv
> >>>>>>> --document-name=users somewhere/foo-users.csv
> >>>>>>> somewhere/bar-users.csv
> >>>>>>> --document-name=groups somewhere/global-groups.csv
> >>>>>>>
> >>>>>>> The template must to be written with this use case in mind, as now
> it
> >>>>>>>
> >>>>>>> has
> >>>>>>>
> >>>>>>> #list some of the documents. (I think in practice you hardly ever
> >>>>>>> want
> >>>>>>>
> >>>>>>> to
> >>>>>>>
> >>>>>>> get a document by hard coded index. Either you don't know how many
> >>>>>>> documents you have, so you can't use hard coded indexes, or you do,
> >>>>>>> and
> >>>>>>> each index has a specific meaning, but then you should name the
> >>>>>>>
> >>>>>>> documents
> >>>>>>>
> >>>>>>> instead, as using indexes is error prone, and hard to read.)
> >>>>>>> Accessing that list of documents in the template, maybe could be
> done
> >>>>>>>
> >>>>>>> like
> >>>>>>>
> >>>>>>> this:
> >>>>>>> - For the "main" documents: `DocumentList`
> >>>>>>> - For explicitly named documents, like "users":
> >>>>>>>
> >>>>>>> `NamedDocumentLists.users`
> >>>>>>>
> >>>>>>> SUMMING UP
> >>>>>>>
> >>>>>>> To unify all 3 use cases into a coherent concept:
> >>>>>>> - `NamedDocumentLists.<name>` is the most generic form, and while
> you
> >>>>>>>
> >>>>>>> can
> >>>>>>>
> >>>>>>> achieve everything with it, using it requires your template to
> handle
> >>>>>>>
> >>>>>>> the
> >>>>>>>
> >>>>>>> most generic case too. So, I think it would be rarely used.
> >>>>>>> - `DocumentList` is just a shorthand for `NamedDocumentLists.main`.
> >>>>>>>
> >>>>>>> It's
> >>>>>>>
> >>>>>>> used if you only have one kind of documents (single format and
> >>>>>>> schema),
> >>>>>>>
> >>>>>>> but
> >>>>>>>
> >>>>>>> potentially multiple of them.
> >>>>>>> - `NamedDocuments.<name>` expresses that you expect exactly 1
> >>>>>>> document
> >>>>>>>
> >>>>>>> of
> >>>>>>>
> >>>>>>> the given name.
> >>>>>>> - `Document` is just a shorthand for `NamedDocuments.main`. This is
> >>>>>>> for
> >>>>>>>
> >>>>>>> the
> >>>>>>>
> >>>>>>> most natural/frequent use case.
> >>>>>>>
> >>>>>>> That's 4 possible ways of accessing your documents, which is a
> >>>>>>>
> >>>>>>> trade-off
> >>>>>>>
> >>>>>>> for the sake of these:
> >>>>>>> - Catching CLI (or Maven, etc.) input where the template output
> >>>>>>> likely
> >>>>>>>
> >>>>>>> will
> >>>>>>>
> >>>>>>> be wrong. That's only possible if the user can communicate its
> intent
> >>>>>>>
> >>>>>>> in
> >>>>>>>
> >>>>>>> the template.
> >>>>>>> - Users don't need to deal with concepts that are irrelevant in
> their
> >>>>>>> concrete use case. Just start with the trivial, `Document`, and
> later
> >>>>>>>
> >>>>>>> if
> >>>>>>>
> >>>>>>> the need arises, generalize to named documents, document lists, or
> >>>>>>>
> >>>>>>> both.
> >>>>>>>
> >>>>>>> What do guys think?
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Best regards,
> >>>> Daniel Dekany
> >>>>
> >>>
> >>>
> >>> --
> >>> Best regards,
> >>> Daniel Dekany
> >>
> >>
> >
> > --
> > Best regards,
> > Daniel Dekany
>
>

-- 
Best regards,
Daniel Dekany

Re: freemarker-generator: Improving the input documents concept

Posted by Siegfried Goeschl <si...@gmail.com>.

HI Daniel,

Seem my comments below

Thanks in advance, 

Siegfried Goeschl


> On 29.02.2020, at 19:08, Daniel Dekany <da...@gmail.com> wrote:
> 
> FREEMARKER-135 freemarker-generator-cli: Support user-supplied names for
> datasources
> 
> So, I can do this to have both a name an a group associated to a data
> source:
>  --datasource someName:someGroup=somewhere/something

Correct

> Or if I only want a name, but not a group (or an ""  group actually -
> bug?), then:
>  --datasource someName=somewhere/something

Correct

> 
> Or if only a group but not a name (or a "" name actually) then:
>  --datasource :someGroup=somewhere/something

Mhmm, that would be unintended functionality from my side - current approach is that every "Document" / "Datasource / DataSource" is named

> 
> A name must identify exactly 1 data source, while a group identifies a list
> of data sources.

No, every "Document" / "Datasource / DataSource" has a name currently but uniqueness is not enforced. Only if you want to get a "Document" / "Datasource / DataSource" with it's exact name I checked for exactly one search hit and throw an exception. I try to provide a useful name even when the content is coming from an URL or STDIN (and I will probably add environment variables as "Document" / "Datasource / DataSource", e.g configuration in the cloud as JSON content passed as environment variable)

> 
> Is that this idea, that the a data source can be part of a group, and then
> is also possibly identifiable with a name comes from an use case? I mean,
> it's possibly important somewhere, but if so, then it's strange that you
> can put something into only a single group. If we need this kind of thing,
> then perhaps you should be just allowed to associate the data source with a
> list of names (kind of like tagging), and then when the template wants to
> get something by name, it will tell there if it expects exactly one or a
> list of data sources. Then you don't need to introduce two terms in the
> documentation either (names and groups). Again, if we want this at all,
> instead of just going with a data source that itself gives a list. (And if
> not, how will we handle a data source that loads from a non-file source?)

I actually thought of implementing tagging but considered a "group" sufficient.

* If you don't define anything everything goes into the "default" group
* For individual documents you can define a name and an optional group

I think we have a different understanding what a "Document" / "Datasource / DataSource" should do

* It is a dumb 
* It is lazy since data is only loaded on demand
* There is no automagic like "oh, this is a JSON file, so let's go to the JSON tool and create a map readily accessible in the data model"

> 
> Note that the current command line syntax doesn't work well with shell
> wildcard expansion. Like this:
> --datasource :someGroup=logs/*.log
> will try to expand ":someGroup=logs/*.log", and because it finds nothing
> (and because the rules of sh and the like is a mess), you will get the
> parameter value as is, without * expanded.

The joy of programming - I did not intend to use "name:group" together with wildcards :-)

> 
> Also,  I think the syntax with colon should be flipped, because on other
> places foo:bar usually means that foo is the bigger unit (the container),
> and bar is the smaller unit (the child).

I Disagree here - I think using a name would be used more often. I added the "group" as an afterthought since some grouping could be useful

> 
> On Sat, Feb 29, 2020 at 5:03 PM Siegfried Goeschl <
> siegfried.goeschl@gmail.com> wrote:
> 
>> Hi Daniel,
>> 
>> I'm an enterprise developer - bad habits die hard :-)
>> 
>> So I closed the following tickets and merged the branches
>> 
>> 1) FREEMARKER-129 freemarker-generator: Merge "freemarker-cli" into
>> "freemarker-generator"
>> 2) FREEMARKER-134 freemarker-generator: Rename "Document" to "Datasource"
>> 3) FREEMARKER-135 freemarker-generator-cli: Support user-supplied names
>> for datasources
>> 
>> Thanks in advance,
>> 
>> Siegfried Goeschl
>> 
>> 
>>> On 29.02.2020, at 12:19, Daniel Dekany <da...@gmail.com> wrote:
>>> 
>>> Yeah, and of course, you can merge that branch. You can even work on the
>>> master directly after all.
>>> 
>>> On Sat, Feb 29, 2020 at 12:17 PM Daniel Dekany <da...@gmail.com>
>>> wrote:
>>> 
>>>> But, I do recognize the cattle use case (several "faceless" files with
>>>> common format/schema). Only, my idea is to push that complexity on the
>> data
>>>> source. The "data source" concept shields the rest of the application
>> from
>>>> the details of how the data is stored or retrieved. So, a data source
>> might
>>>> loads a bunch of log files from a directory, and present them as a
>> single
>>>> big table, or like a list of tables, etc. So I want to deal with the
>> cattle
>>>> use case, but the question is what part of the of architecture will deal
>>>> with this complication, with other words, how do you box things. Why my
>>>> initial bet is to stuff that complication into the "data source"
>>>> implementation(s) is that data sources are inherently varied. Some
>> returns
>>>> a table-like thing, some have multiple named tables (worksheets in
>> Excel),
>>>> some returns tree of nodes (XML), etc. So then, some might returns a
>>>> list-of-list-of log records, or just a single list of log-records (put
>>>> together from daily log files). That way cattles don't add to conceptual
>>>> complexity. Now, you might be aware of cases where the cattle concept
>> must
>>>> be more exposed than this, and the we can't box things like this. But
>> this
>>>> is what I tried to express.
>>>> 
>>>> Regarding "output generators", and how that applies on the command
>> line. I
>>>> think it's important that the common core between Maven and
>> command-line is
>>>> as fat as possible. Ideally, they are just two syntax to set up the same
>>>> thing. Mostly at least. So, if you specify a template file to the CLI
>>>> application, in a way so that it causes it to process that template to
>>>> generate a single output, then there you have just defined an "output
>>>> generator" (even if it wasn't explicitly called like that in the command
>>>> line). If you specify 3 csv files to the CLI application, in a way so
>> that
>>>> it causes it to generate 3 output files, then you have just defined 3
>>>> "output generators" there (there's at least one template specified there
>>>> too, but that wasn't an "output generator" itself, it was just an
>> attribute
>>>> of the 3 output generators). If you specify 1 template, and 3 csv
>> files, in
>>>> a way so that it will yield 4 output files (1 for the template, 3 for
>> the
>>>> csv-s), then you have defined 4 output generators there. If you have a
>> data
>>>> source that loads a list of 3 entities (say, 3 csv files, so it's a
>> list of
>>>> tables then), and you have 2 templates, and you tell the CLI to execute
>>>> each template for each item in said data source, then you have just
>> defined
>>>> 6 "output generators".
>>>> 
>>>> On Fri, Feb 28, 2020 at 11:08 AM Siegfried Goeschl <
>>>> siegfried.goeschl@gmail.com> wrote:
>>>> 
>>>>> Hi Daniel,
>>>>> 
>>>>> That all depends on your mental model and work you do, expectations,
>>>>> experience :-)
>>>>> 
>>>>> 
>>>>> __Document Handling__
>>>>> 
>>>>> *"But I think actually we have no good use case for list of documents
>>>>> that's passed at once to a single template run, so, we can just ignore
>>>>> that complication"*
>>>>> 
>>>>> In my case that's not a complication but my daily business - I'm
>>>>> regularly wading through access logs - yesterday probably a couple of
>>>>> hundreds access logs across two staging sites to help tracking some
>>>>> strange API gateway issues :-)
>>>>> 
>>>>> My gut feeling is (borrowing from
>>>>> 
>>>>> 
>> https://medium.com/@Joachim8675309/devops-concepts-pets-vs-cattle-2380b5aab313
>>>>> )
>>>>> 
>>>>> 1. You have a few lovely named documents / templates - `pets`
>>>>> 2. You have tons of anonymous documents / templates to process -
>>>>> `cattle`
>>>>> 3. The "grey area" comes into play when mixing `pets & cattle`
>>>>> 
>>>>> `freemarker-cli` was built with 2) in mind and I want to cover 1) since
>>>>> it is equally important and common.
>>>>> 
>>>>> 
>>>>> __Template And Document Processing Modes__
>>>>> 
>>>>> IMHO it is important to answer the following question : "How many
>>>>> outputs do you get when rendering 2 template and 3 datasources? Two,
>>>>> Three or Six?"
>>>>> 
>>>>> Your answer is influenced by your mental model / experience
>>>>> 
>>>>> * When wading through tons of CSV files, access logs, etc. the answer
>> is
>>>>> "2"
>>>>> * When doing source code generation the obvious answer is "6"
>>>>> * Can't image a use case which results in "3" but I'm pretty sure we
>>>>> will encounter one
>>>>> 
>>>>> __Template and document mode probably shouldn't exist__
>>>>> 
>>>>> That's hard for me to fully understand - I definitely lack your
>> insights
>>>>> & experience writing such tools :-)
>>>>> 
>>>>> Defining the `Output Generator` is the underlying model for the Maven
>>>>> plugin (and probably FMPP).
>>>>> 
>>>>> I'm not sure if this applies for command lines at least not in the way
>> I
>>>>> use them (or would like to use them)
>>>>> 
>>>>> 
>>>>> Thanks in advance,
>>>>> 
>>>>> Siegfried Goeschl
>>>>> 
>>>>> PS: Can/shall I merge the PR to bring in `freemarker-cli`?
>>>>> 
>>>>> 
>>>>> On 28 Feb 2020, at 9:14, Daniel Dekany wrote:
>>>>> 
>>>>>> Yeah, "data source" is surely a too popular name, but for reason.
>>>>>> Anyone
>>>>>> has other ideas?
>>>>>> 
>>>>>> As of naming data sources and such. One thing I was wondering about
>>>>>> back
>>>>>> then is how to deal with list of documents given to a template, versus
>>>>>> exactly 1 document given to a template. But I think actually we have
>>>>>> no
>>>>>> good use case for list of documents that's passed at once to a single
>>>>>> template run, so, we can just ignore that complication. A document has
>>>>>> a
>>>>>> name, and that's always just a single document, not a collection, as
>>>>>> far as
>>>>>> the template is concerned. (We can have multiple documents per run,
>>>>>> but
>>>>>> those normally yield separate output generators, so it's still only
>>>>>> one
>>>>>> document per template.) However, we can have data source types
>>>>>> (document
>>>>>> types with old terminology) that collect together multiple data files.
>>>>>> So
>>>>>> then that complexity is encapsulated into the data source type, and
>>>>>> doesn't
>>>>>> complicate the overall architecture. That's another case when a data
>>>>>> source
>>>>>> is not just a file. Like maybe there's a data source type that loads
>>>>>> all
>>>>>> the CSV-s from a directory, into a single big table (I had such case),
>>>>>> or
>>>>>> even into a list of tables. Or, as I mentioned already, a data source
>>>>>> is
>>>>>> maybe an SQL query on a JDBC data source (and we got the first term
>>>>>> clash... JDBC also call them data sources).
>>>>>> 
>>>>>> Template and document mode probably shouldn't exist from user
>>>>>> perspective
>>>>>> either, at least not as a global option that must apply to everything
>>>>>> in a
>>>>>> run. They could just give the files that define the "output
>>>>>> generators",
>>>>>> and some of them will be templates, some of them are data files, in
>>>>>> which
>>>>>> case a template need to be associated with them (and there can be a
>>>>>> couple
>>>>>> of ways of doing that). And then again, there are the cases where you
>>>>>> want
>>>>>> to create one output generator per entity from some data source.
>>>>>> 
>>>>>> On Fri, Feb 28, 2020 at 8:23 AM Siegfried Goeschl <
>>>>>> siegfried.goeschl@gmail.com> wrote:
>>>>>> 
>>>>>>> Hi Daniel,
>>>>>>> 
>>>>>>> See my comments below - and thanks for your patience and input :-)
>>>>>>> 
>>>>>>> *Renaming Document To DataSource*
>>>>>>> 
>>>>>>> Yes, makes sense. I tried to avoid since I'm using javax.activation
>>>>>>> and
>>>>>>> its DataSource.
>>>>>>> 
>>>>>>> *Template And Document Mode*
>>>>>>> 
>>>>>>> Agreed - I think it is a valuable abstraction for the user but it is
>>>>>>> not
>>>>>>> an implementation concept :-)
>>>>>>> 
>>>>>>> *Document Without Symbolic Names*
>>>>>>> 
>>>>>>> Also agreed and it is going to change but I have not settled my mind
>>>>>>> yet
>>>>>>> what exactly to implement.
>>>>>>> 
>>>>>>> Thanks in advance,
>>>>>>> 
>>>>>>> Siegfried Goeschl
>>>>>>> 
>>>>>>> On 28 Feb 2020, at 1:05, Daniel Dekany wrote:
>>>>>>> 
>>>>>>> A few quick thoughts on that:
>>>>>>> 
>>>>>>> - We should replace the "document" term with something more speaking.
>>>>>>> It
>>>>>>> doesn't tell that it's some kind of input. Also, most of these inputs
>>>>>>> aren't something that people typically call documents. Like a csv
>>>>>>> file, or
>>>>>>> a database table, which is not even a file (OK we don't support such
>>>>>>> thing
>>>>>>> at the moment). I think, maybe "data source" is a safe enough term.
>>>>>>> (It
>>>>>>> also rhymes with data model.)
>>>>>>> - You have separate "template" and "document" "mode", that applies to
>>>>>>> a
>>>>>>> whole run. I think such specialization won't be helpful. We could
>>>>>>> just say,
>>>>>>> on the conceptual level at lest, that we need a set of "outputs
>>>>>>> generators". An output generator is an object (in the API) that
>>>>>>> specifies a
>>>>>>> template, a data-model (where the data-model is possibly populated
>>>>>>> with
>>>>>>> "documents"), and an output "sink" (a file path, or stdout), and can
>>>>>>> generate the output itself. A practical way of defining the output
>>>>>>> generators in a CLI application is via a bunch of files, each
>>>>>>> defining an
>>>>>>> output generator. Some of those files is maybe a template (that you
>>>>>>> can
>>>>>>> even detect from the file extension), or a data file that we
>>>>>>> currently call
>>>>>>> a "document". They could freely mix inside the same run. I have also
>>>>>>> met
>>>>>>> use case when you have a single table (single "document"), and each
>>>>>>> record
>>>>>>> in it yields an output file. That can also be described in some file
>>>>>>> format, or really in any other way, like directly in command line
>>>>>>> argument,
>>>>>>> via API, etc.
>>>>>>> - You have multiple documents without associated symbolical name in
>>>>>>> some
>>>>>>> examples. Templates can't identify those then in a well maintainable
>>>>>>> way.
>>>>>>> The actual file name is often not a good identifier, can change over
>>>>>>> time,
>>>>>>> and you might don't even have good control over it, like you already
>>>>>>> receive it as a parameter from somewhere else, or someone
>>>>>>> moves/renames
>>>>>>> that files that you need to read. Index is also not very good, but I
>>>>>>> have
>>>>>>> written about that earlier.
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Feb 26, 2020 at 9:33 AM Siegfried Goeschl <
>>>>>>> siegfried.goeschl@gmail.com> wrote:
>>>>>>> 
>>>>>>> Hi folks,
>>>>>>> 
>>>>>>> still wrapping my side around but assembled some thoughts here -
>>>>>>> https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449
>>>>>>> 
>>>>>>> Thanks in advance,
>>>>>>> 
>>>>>>> Siegfried Goeschl
>>>>>>> 
>>>>>>> 
>>>>>>> On 23 Feb 2020, at 23:14, Daniel Dekany <dd...@apache.org> wrote:
>>>>>>> 
>>>>>>> What you are describing is more like the angle that FMPP took
>>>>>>> initially,
>>>>>>> where templates drive things, they generate the output for themselves
>>>>>>> 
>>>>>>> (even
>>>>>>> 
>>>>>>> multiple output files if they wish). By default output files name
>>>>>>> (and
>>>>>>> relative path) is deduced from template name. There was also a global
>>>>>>> data-model, built in a configuration file (or equally, built via
>>>>>>> command
>>>>>>> line arguments, or both mixed), from which templates get whatever
>>>>>>> data
>>>>>>> 
>>>>>>> they
>>>>>>> 
>>>>>>> are interested in. Take a look at the figures here:
>>>>>>> http://fmpp.sourceforge.net/qtour.html. Later, this concept was
>>>>>>> 
>>>>>>> generalized
>>>>>>> 
>>>>>>> a bit more, because you could add XML files at the same place where
>>>>>>> you
>>>>>>> have the templates, and then you could associate transform templates
>>>>>>> to
>>>>>>> 
>>>>>>> the
>>>>>>> 
>>>>>>> XML files (based on path pattern and/or the XML document element).
>>>>>>> Now
>>>>>>> that's like what freemarker-generator had initially (data files drive
>>>>>>> output, and the template is there to transform it).
>>>>>>> 
>>>>>>> So I think the generic mental model would like this:
>>>>>>> 
>>>>>>> 1. You got files that drive the process, let's call them *generator
>>>>>>> files* for now. Usually, each generator file yields an output file
>>>>>>> (but
>>>>>>> maybe even multiple output files, as you might saw in the last
>>>>>>> figure).
>>>>>>> These generator files can be of many types, like XML, JSON, XLSX (as
>>>>>>> 
>>>>>>> in the
>>>>>>> 
>>>>>>> original freemarker-generator), and even templates (as is the norm in
>>>>>>> FMPP). If the file is not a template, then you got a set of
>>>>>>> transformer
>>>>>>> templates (-t CLI option) in a separate directory, which can be
>>>>>>> 
>>>>>>> associated
>>>>>>> 
>>>>>>> with the generator files base on name patterns, and even based on
>>>>>>> 
>>>>>>> content
>>>>>>> 
>>>>>>> (schema usually). If the generator file is a template (so that's a
>>>>>>> positional @Parameter CLI argument that happens to be an *.ftl, and
>>>>>>> is
>>>>>>> 
>>>>>>> not
>>>>>>> 
>>>>>>> a template file specified after the "-t" option), then you just
>>>>>>> Template.process(...) it, and it prints what the output will be.
>>>>>>> 2. You also have a set of variables, the global data-model, that
>>>>>>> contains commonly useful stuff, like what you now call parameters
>>>>>>> (CLI
>>>>>>> -Pname=value), but also maybe data loaded from JSON, XML, etc.. Those
>>>>>>> 
>>>>>>> data
>>>>>>> 
>>>>>>> files aren't "generator files". Templates just use them if they need
>>>>>>> 
>>>>>>> them.
>>>>>>> 
>>>>>>> An important thing here is to reuse the same mechanism to read and
>>>>>>> 
>>>>>>> parse
>>>>>>> 
>>>>>>> those data files, which was used in templates when transforming
>>>>>>> 
>>>>>>> generator
>>>>>>> 
>>>>>>> files. So we need a common format for specifying how to load data
>>>>>>> 
>>>>>>> files.
>>>>>>> 
>>>>>>> That's maybe just FTL that #assigns to the variables, or maybe more
>>>>>>> declarative format.
>>>>>>> 
>>>>>>> What I have described in the original post here was a less generic
>>>>>>> form
>>>>>>> 
>>>>>>> of
>>>>>>> 
>>>>>>> this, as I tried to be true with the original approach. I though the
>>>>>>> proposal will be drastic enough as it is... :) There, the "main"
>>>>>>> document
>>>>>>> is the "generator file" from point 1, the "-t" template is the
>>>>>>> transform
>>>>>>> template for the "main" document, and the other named documents
>>>>>>> ("users",
>>>>>>> "groups") is a poor man's shared data-model from point 2 (together
>>>>>>> with
>>>>>>> with -PName=value).
>>>>>>> 
>>>>>>> There's further somewhat confusing thing to get right with the
>>>>>>> list-of-documents (`DocuentList`, `NamedDocumentLists`) thing though.
>>>>>>> In
>>>>>>> the model above, as per point 1, if you list multiple data files,
>>>>>>> each
>>>>>>> 
>>>>>>> will
>>>>>>> 
>>>>>>> generate a separate output file. So, if you need take in a list of
>>>>>>> files
>>>>>>> 
>>>>>>> to
>>>>>>> 
>>>>>>> transform it to a single output file (or at least with a single
>>>>>>> transform
>>>>>>> template execution), then you have to be explicit about that, as
>>>>>>> that's
>>>>>>> 
>>>>>>> not
>>>>>>> 
>>>>>>> the default behavior anymore. But it's still absolutely possible.
>>>>>>> Imagine
>>>>>>> it as a "list of XLSX-es" is itself like a file format. You need some
>>>>>>> CLI
>>>>>>> (and Maven config, etc.) syntax to express that, but that shouldn't
>>>>>>> be a
>>>>>>> big deal.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Sun, Feb 23, 2020 at 9:43 PM Siegfried Goeschl <
>>>>>>> siegfried.goeschl@gmail.com> wrote:
>>>>>>> 
>>>>>>> Hi Daniel,
>>>>>>> 
>>>>>>> Good timing - I was looking at a similar problem from different angle
>>>>>>> yesterday (see below)
>>>>>>> 
>>>>>>> Don't have enough time to answer your email in detail now - will do
>>>>>>> that
>>>>>>> tomorrow evening
>>>>>>> 
>>>>>>> Thanks in advance,
>>>>>>> 
>>>>>>> Siegfried Goeschl
>>>>>>> 
>>>>>>> 
>>>>>>> ===. START
>>>>>>> # FreeMarker CLI Improvement
>>>>>>> ## Support Of Multiple Template Files
>>>>>>> Currently we support the following combinations
>>>>>>> 
>>>>>>> * Single template and no data files
>>>>>>> * Single template and one or more data files
>>>>>>> 
>>>>>>> But we can not support the following use case which is quite typical
>>>>>>> in
>>>>>>> the cloud
>>>>>>> 
>>>>>>> __Convert multiple templates with a single data file, e.g copying a
>>>>>>> directory of configuration files using a JSON configuration file__
>>>>>>> 
>>>>>>> ## Implementation notes
>>>>>>> * When we copy a directory we can remove the `ftl`extension on the
>>>>>>> fly
>>>>>>> * We might need an `exclude` filter for the copy operation
>>>>>>> * Initially resolve to a list of template files and process one after
>>>>>>> another
>>>>>>> * Need to calculate the output file location and extension
>>>>>>> * We need to rename the existing command line parameters (see below)
>>>>>>> * Do we need multiple include and exclude filter?
>>>>>>> * Do we need file versus directory filters?
>>>>>>> 
>>>>>>> ### Command Line Options
>>>>>>> ```
>>>>>>> --input-encoding : Encoding of the documents
>>>>>>> --output-encoding : Encoding of the rendered template
>>>>>>> --template-encoding : Encoding of the template
>>>>>>> --output : Output file or directory
>>>>>>> --include-document : Include pattern for documents
>>>>>>> --exclude-document : Exclude pattern for documents
>>>>>>> --include-template: Include pattern for templates
>>>>>>> --exclude-template : Exclude pattern for templates
>>>>>>> ```
>>>>>>> 
>>>>>>> ### Command Line Examples
>>>>>>> ```text
>>>>>>> # Copy all FTL templates found in "ext/config" to the "/config"
>>>>>>> 
>>>>>>> directory
>>>>>>> 
>>>>>>> using the data from "config.json"
>>>>>>> 
>>>>>>> freemarker-cli -t ./ext/config --include-template *.ftl --o /config
>>>>>>> 
>>>>>>> config.json
>>>>>>> 
>>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>>>>>>> 
>>>>>>> --output
>>>>>>> 
>>>>>>> /config config.json
>>>>>>> 
>>>>>>> # Bascically the same using a named document "configuration"
>>>>>>> # It might make sense to expose "conf" directly in the FreeMarker
>>>>>>> data
>>>>>>> model
>>>>>>> # It might make sens to allow URIs for loading documents
>>>>>>> 
>>>>>>> freemarker-cli -t ./ext/config/*.ftl -o /config -d
>>>>>>> 
>>>>>>> configuration=config.json
>>>>>>> 
>>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>>>>>>> 
>>>>>>> --output
>>>>>>> 
>>>>>>> /config --document configuration=config.json
>>>>>>> 
>>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>>>>>>> 
>>>>>>> --output
>>>>>>> 
>>>>>>> /config --document configuration=file:///config.json
>>>>>>> 
>>>>>>> # Bascically the same using an environment variable as named document
>>>>>>> 
>>>>>>> freemarker-cli -t ./ext/config --include-template *.ftl -o /config -d
>>>>>>> 
>>>>>>> configuration=env:///CONFIGURATION
>>>>>>> 
>>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>>>>>>> 
>>>>>>> --output
>>>>>>> 
>>>>>>> /config --document configuration=env:///CONFIGURATION
>>>>>>> ```
>>>>>>> === END
>>>>>>> 
>>>>>>> On 23.02.2020, at 16:37, Daniel Dekany <dd...@apache.org> wrote:
>>>>>>> 
>>>>>>> Input documents is a fundamental concept in freemarker-generator, so
>>>>>>> we
>>>>>>> should think about that more, and probably refine/rework how it's
>>>>>>> done.
>>>>>>> 
>>>>>>> Currently it works like this, with CLI at least.
>>>>>>> 
>>>>>>> freemarker-cli
>>>>>>> -t access-report.ftl
>>>>>>> somewhere/foo-access-log.csv
>>>>>>> 
>>>>>>> Then in access-report.ftl you have to do something like this:
>>>>>>> 
>>>>>>> <#assign doc = Documents.get(0)>
>>>>>>> ... process doc here
>>>>>>> 
>>>>>>> (The more idiomatic Documents[0] won't work. Actually, that lead to a
>>>>>>> 
>>>>>>> funny
>>>>>>> 
>>>>>>> chain of coincidences: It returned the string "D", then
>>>>>>> 
>>>>>>> CSVTool.parse(...)
>>>>>>> 
>>>>>>> happily parsed that to a table with the single column "D", and 0
>>>>>>> rows,
>>>>>>> 
>>>>>>> and
>>>>>>> 
>>>>>>> as there were 0 rows, the template didn't run into an error because
>>>>>>> row.myExpectedColumn refers to a missing column either, so the
>>>>>>> process
>>>>>>> finished with success. (: Pretty unlucky for sure. The root was
>>>>>>> unintentionally breaking a FreeMarker idiom though; eventually we
>>>>>>> will
>>>>>>> 
>>>>>>> have
>>>>>>> 
>>>>>>> to work on those too, but, different topic.)
>>>>>>> 
>>>>>>> However, actually multiple input documents can be passed in:
>>>>>>> 
>>>>>>> freemarker-cli
>>>>>>> -t access-report.ftl
>>>>>>> somewhere/foo-access-log.csv
>>>>>>> somewhere/bar-access-log.csv
>>>>>>> 
>>>>>>> Above template will still work, though then you ignored all but the
>>>>>>> 
>>>>>>> first
>>>>>>> 
>>>>>>> document. So if you expect any number of input documents, you
>>>>>>> probably
>>>>>>> 
>>>>>>> will
>>>>>>> 
>>>>>>> have to do this:
>>>>>>> 
>>>>>>> <#list Documents.list as doc>
>>>>>>> ... process doc here
>>>>>>> </#list>
>>>>>>> 
>>>>>>> (The more idiomatic <#list Documents as doc> won't work; but again,
>>>>>>> 
>>>>>>> those
>>>>>>> 
>>>>>>> we will work out in a different thread.)
>>>>>>> 
>>>>>>> 
>>>>>>> So, what would be better, in my opinion. I start out from what I
>>>>>>> think
>>>>>>> 
>>>>>>> are
>>>>>>> 
>>>>>>> the common uses cases, in decreasing order of frequency. Goal is to
>>>>>>> 
>>>>>>> make
>>>>>>> 
>>>>>>> those less error prone for the users, and simpler to express.
>>>>>>> 
>>>>>>> USE CASE 1
>>>>>>> 
>>>>>>> You have exactly 1 input documents, which is therefore simply "the"
>>>>>>> document in the mind of the user. This is probably the typical use
>>>>>>> 
>>>>>>> case,
>>>>>>> 
>>>>>>> but at least the use case users typically start out from when
>>>>>>> starting
>>>>>>> 
>>>>>>> the
>>>>>>> 
>>>>>>> work.
>>>>>>> 
>>>>>>> freemarker-cli
>>>>>>> -t access-report.ftl
>>>>>>> somewhere/foo-access-log.csv
>>>>>>> 
>>>>>>> Then `Documents.get(0)` is not very fitting. Most importantly it's
>>>>>>> 
>>>>>>> error
>>>>>>> 
>>>>>>> prone, because if the user passed in more than 1 documents (can even
>>>>>>> 
>>>>>>> happen
>>>>>>> 
>>>>>>> totally accidentally, like if the user was lazy and used a wildcard
>>>>>>> 
>>>>>>> that
>>>>>>> 
>>>>>>> the shell exploded), the template will silently ignore the rest of
>>>>>>> the
>>>>>>> documents, and the singe document processed will be practically
>>>>>>> picked
>>>>>>> randomly. The user might won't notice that and submits a bad report
>>>>>>> or
>>>>>>> 
>>>>>>> such.
>>>>>>> 
>>>>>>> I think that in this use case the document should be simply referred
>>>>>>> as
>>>>>>> `Document` in the template. When you have multiple documents there,
>>>>>>> referring to `Document` should be an error, saying that the template
>>>>>>> 
>>>>>>> was
>>>>>>> 
>>>>>>> made to process a single document only.
>>>>>>> 
>>>>>>> 
>>>>>>> USE CASE 2
>>>>>>> 
>>>>>>> You have multiple input documents, but each has different role
>>>>>>> 
>>>>>>> (different
>>>>>>> 
>>>>>>> schema, maybe different file type). Like, you pass in users.csv and
>>>>>>> groups.csv. Each has difference schema, and so you want to access
>>>>>>> them
>>>>>>> differently, but in the same template.
>>>>>>> 
>>>>>>> freemarker-cli
>>>>>>> [...]
>>>>>>> --named-document users somewhere/foo-users.csv
>>>>>>> --named-document groups somewhere/foo-groups.csv
>>>>>>> 
>>>>>>> Then in the template you could refer to them as:
>>>>>>> 
>>>>>>> `NamedDocuments.users`,
>>>>>>> 
>>>>>>> and `NamedDocuments.groups`.
>>>>>>> 
>>>>>>> Use Case 1, and 2 can be unified into a coherent concept, where
>>>>>>> 
>>>>>>> `Document`
>>>>>>> 
>>>>>>> is just a shorthand for `NamedDocuments.main`. It's called "main"
>>>>>>> 
>>>>>>> because
>>>>>>> 
>>>>>>> that's "the" document the template is about, but then you have to
>>>>>>> added
>>>>>>> some helper documents, with symbolic names representing their role.
>>>>>>> 
>>>>>>> freemarker-cli
>>>>>>> -t access-report.ftl
>>>>>>> --document-name=main somewhere/foo-access-log.csv
>>>>>>> --document-name=users somewhere/foo-users.csv
>>>>>>> --document-name=groups somewhere/foo-groups.csv
>>>>>>> 
>>>>>>> Here, `Document` still works in the template, and it refers to
>>>>>>> `somewhere/foo-access-log.csv`. (While omitting --document-name=main
>>>>>>> 
>>>>>>> above
>>>>>>> 
>>>>>>> would be cleaner, I couldn't figure out how to do that with Picocli.
>>>>>>> Anyway, for now the point is the concept, which is not specific to
>>>>>>> 
>>>>>>> CLI.)
>>>>>>> 
>>>>>>> USE CASE 3
>>>>>>> 
>>>>>>> Here you have several of the same kind of documents. That has a more
>>>>>>> generic sub-use-case, when you have explicitly named documents (like
>>>>>>> "users" above), and for some you expect multiple input files.
>>>>>>> 
>>>>>>> freemarker-cli
>>>>>>> -t access-report.ftl
>>>>>>> --document-name=main somewhere/foo-access-log.csv
>>>>>>> somewhere/bar-access-log.csv
>>>>>>> --document-name=users somewhere/foo-users.csv
>>>>>>> somewhere/bar-users.csv
>>>>>>> --document-name=groups somewhere/global-groups.csv
>>>>>>> 
>>>>>>> The template must to be written with this use case in mind, as now it
>>>>>>> 
>>>>>>> has
>>>>>>> 
>>>>>>> #list some of the documents. (I think in practice you hardly ever
>>>>>>> want
>>>>>>> 
>>>>>>> to
>>>>>>> 
>>>>>>> get a document by hard coded index. Either you don't know how many
>>>>>>> documents you have, so you can't use hard coded indexes, or you do,
>>>>>>> and
>>>>>>> each index has a specific meaning, but then you should name the
>>>>>>> 
>>>>>>> documents
>>>>>>> 
>>>>>>> instead, as using indexes is error prone, and hard to read.)
>>>>>>> Accessing that list of documents in the template, maybe could be done
>>>>>>> 
>>>>>>> like
>>>>>>> 
>>>>>>> this:
>>>>>>> - For the "main" documents: `DocumentList`
>>>>>>> - For explicitly named documents, like "users":
>>>>>>> 
>>>>>>> `NamedDocumentLists.users`
>>>>>>> 
>>>>>>> SUMMING UP
>>>>>>> 
>>>>>>> To unify all 3 use cases into a coherent concept:
>>>>>>> - `NamedDocumentLists.<name>` is the most generic form, and while you
>>>>>>> 
>>>>>>> can
>>>>>>> 
>>>>>>> achieve everything with it, using it requires your template to handle
>>>>>>> 
>>>>>>> the
>>>>>>> 
>>>>>>> most generic case too. So, I think it would be rarely used.
>>>>>>> - `DocumentList` is just a shorthand for `NamedDocumentLists.main`.
>>>>>>> 
>>>>>>> It's
>>>>>>> 
>>>>>>> used if you only have one kind of documents (single format and
>>>>>>> schema),
>>>>>>> 
>>>>>>> but
>>>>>>> 
>>>>>>> potentially multiple of them.
>>>>>>> - `NamedDocuments.<name>` expresses that you expect exactly 1
>>>>>>> document
>>>>>>> 
>>>>>>> of
>>>>>>> 
>>>>>>> the given name.
>>>>>>> - `Document` is just a shorthand for `NamedDocuments.main`. This is
>>>>>>> for
>>>>>>> 
>>>>>>> the
>>>>>>> 
>>>>>>> most natural/frequent use case.
>>>>>>> 
>>>>>>> That's 4 possible ways of accessing your documents, which is a
>>>>>>> 
>>>>>>> trade-off
>>>>>>> 
>>>>>>> for the sake of these:
>>>>>>> - Catching CLI (or Maven, etc.) input where the template output
>>>>>>> likely
>>>>>>> 
>>>>>>> will
>>>>>>> 
>>>>>>> be wrong. That's only possible if the user can communicate its intent
>>>>>>> 
>>>>>>> in
>>>>>>> 
>>>>>>> the template.
>>>>>>> - Users don't need to deal with concepts that are irrelevant in their
>>>>>>> concrete use case. Just start with the trivial, `Document`, and later
>>>>>>> 
>>>>>>> if
>>>>>>> 
>>>>>>> the need arises, generalize to named documents, document lists, or
>>>>>>> 
>>>>>>> both.
>>>>>>> 
>>>>>>> What do guys think?
>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Best regards,
>>>> Daniel Dekany
>>>> 
>>> 
>>> 
>>> --
>>> Best regards,
>>> Daniel Dekany
>> 
>> 
> 
> -- 
> Best regards,
> Daniel Dekany

Re: freemarker-generator: Improving the input documents concept

Posted by Daniel Dekany <da...@gmail.com>.

FREEMARKER-135 freemarker-generator-cli: Support user-supplied names for
datasources

So, I can do this to have both a name an a group associated to a data
source:
  --datasource someName:someGroup=somewhere/something

Or if I only want a name, but not a group (or an ""  group actually -
bug?), then:
  --datasource someName=somewhere/something

Or if only a group but not a name (or a "" name actually) then:
  --datasource :someGroup=somewhere/something

A name must identify exactly 1 data source, while a group identifies a list
of data sources.

Is that this idea, that the a data source can be part of a group, and then
is also possibly identifiable with a name comes from an use case? I mean,
it's possibly important somewhere, but if so, then it's strange that you
can put something into only a single group. If we need this kind of thing,
then perhaps you should be just allowed to associate the data source with a
list of names (kind of like tagging), and then when the template wants to
get something by name, it will tell there if it expects exactly one or a
list of data sources. Then you don't need to introduce two terms in the
documentation either (names and groups). Again, if we want this at all,
instead of just going with a data source that itself gives a list. (And if
not, how will we handle a data source that loads from a non-file source?)

Note that the current command line syntax doesn't work well with shell
wildcard expansion. Like this:
--datasource :someGroup=logs/*.log
will try to expand ":someGroup=logs/*.log", and because it finds nothing
(and because the rules of sh and the like is a mess), you will get the
parameter value as is, without * expanded.

Also,  I think the syntax with colon should be flipped, because on other
places foo:bar usually means that foo is the bigger unit (the container),
and bar is the smaller unit (the child).

On Sat, Feb 29, 2020 at 5:03 PM Siegfried Goeschl <
siegfried.goeschl@gmail.com> wrote:

> Hi Daniel,
>
> I'm an enterprise developer - bad habits die hard :-)
>
> So I closed the following tickets and merged the branches
>
> 1) FREEMARKER-129 freemarker-generator: Merge "freemarker-cli" into
> "freemarker-generator"
> 2) FREEMARKER-134 freemarker-generator: Rename "Document" to "Datasource"
> 3) FREEMARKER-135 freemarker-generator-cli: Support user-supplied names
> for datasources
>
> Thanks in advance,
>
> Siegfried Goeschl
>
>
> > On 29.02.2020, at 12:19, Daniel Dekany <da...@gmail.com> wrote:
> >
> > Yeah, and of course, you can merge that branch. You can even work on the
> > master directly after all.
> >
> > On Sat, Feb 29, 2020 at 12:17 PM Daniel Dekany <da...@gmail.com>
> > wrote:
> >
> >> But, I do recognize the cattle use case (several "faceless" files with
> >> common format/schema). Only, my idea is to push that complexity on the
> data
> >> source. The "data source" concept shields the rest of the application
> from
> >> the details of how the data is stored or retrieved. So, a data source
> might
> >> loads a bunch of log files from a directory, and present them as a
> single
> >> big table, or like a list of tables, etc. So I want to deal with the
> cattle
> >> use case, but the question is what part of the of architecture will deal
> >> with this complication, with other words, how do you box things. Why my
> >> initial bet is to stuff that complication into the "data source"
> >> implementation(s) is that data sources are inherently varied. Some
> returns
> >> a table-like thing, some have multiple named tables (worksheets in
> Excel),
> >> some returns tree of nodes (XML), etc. So then, some might returns a
> >> list-of-list-of log records, or just a single list of log-records (put
> >> together from daily log files). That way cattles don't add to conceptual
> >> complexity. Now, you might be aware of cases where the cattle concept
> must
> >> be more exposed than this, and the we can't box things like this. But
> this
> >> is what I tried to express.
> >>
> >> Regarding "output generators", and how that applies on the command
> line. I
> >> think it's important that the common core between Maven and
> command-line is
> >> as fat as possible. Ideally, they are just two syntax to set up the same
> >> thing. Mostly at least. So, if you specify a template file to the CLI
> >> application, in a way so that it causes it to process that template to
> >> generate a single output, then there you have just defined an "output
> >> generator" (even if it wasn't explicitly called like that in the command
> >> line). If you specify 3 csv files to the CLI application, in a way so
> that
> >> it causes it to generate 3 output files, then you have just defined 3
> >> "output generators" there (there's at least one template specified there
> >> too, but that wasn't an "output generator" itself, it was just an
> attribute
> >> of the 3 output generators). If you specify 1 template, and 3 csv
> files, in
> >> a way so that it will yield 4 output files (1 for the template, 3 for
> the
> >> csv-s), then you have defined 4 output generators there. If you have a
> data
> >> source that loads a list of 3 entities (say, 3 csv files, so it's a
> list of
> >> tables then), and you have 2 templates, and you tell the CLI to execute
> >> each template for each item in said data source, then you have just
> defined
> >> 6 "output generators".
> >>
> >> On Fri, Feb 28, 2020 at 11:08 AM Siegfried Goeschl <
> >> siegfried.goeschl@gmail.com> wrote:
> >>
> >>> Hi Daniel,
> >>>
> >>> That all depends on your mental model and work you do, expectations,
> >>> experience :-)
> >>>
> >>>
> >>> __Document Handling__
> >>>
> >>> *"But I think actually we have no good use case for list of documents
> >>> that's passed at once to a single template run, so, we can just ignore
> >>> that complication"*
> >>>
> >>> In my case that's not a complication but my daily business - I'm
> >>> regularly wading through access logs - yesterday probably a couple of
> >>> hundreds access logs across two staging sites to help tracking some
> >>> strange API gateway issues :-)
> >>>
> >>> My gut feeling is (borrowing from
> >>>
> >>>
> https://medium.com/@Joachim8675309/devops-concepts-pets-vs-cattle-2380b5aab313
> >>> )
> >>>
> >>> 1. You have a few lovely named documents / templates - `pets`
> >>> 2. You have tons of anonymous documents / templates to process -
> >>> `cattle`
> >>> 3. The "grey area" comes into play when mixing `pets & cattle`
> >>>
> >>> `freemarker-cli` was built with 2) in mind and I want to cover 1) since
> >>> it is equally important and common.
> >>>
> >>>
> >>> __Template And Document Processing Modes__
> >>>
> >>> IMHO it is important to answer the following question : "How many
> >>> outputs do you get when rendering 2 template and 3 datasources? Two,
> >>> Three or Six?"
> >>>
> >>> Your answer is influenced by your mental model / experience
> >>>
> >>> * When wading through tons of CSV files, access logs, etc. the answer
> is
> >>> "2"
> >>> * When doing source code generation the obvious answer is "6"
> >>> * Can't image a use case which results in "3" but I'm pretty sure we
> >>> will encounter one
> >>>
> >>> __Template and document mode probably shouldn't exist__
> >>>
> >>> That's hard for me to fully understand - I definitely lack your
> insights
> >>> & experience writing such tools :-)
> >>>
> >>> Defining the `Output Generator` is the underlying model for the Maven
> >>> plugin (and probably FMPP).
> >>>
> >>> I'm not sure if this applies for command lines at least not in the way
> I
> >>> use them (or would like to use them)
> >>>
> >>>
> >>> Thanks in advance,
> >>>
> >>> Siegfried Goeschl
> >>>
> >>> PS: Can/shall I merge the PR to bring in `freemarker-cli`?
> >>>
> >>>
> >>> On 28 Feb 2020, at 9:14, Daniel Dekany wrote:
> >>>
> >>>> Yeah, "data source" is surely a too popular name, but for reason.
> >>>> Anyone
> >>>> has other ideas?
> >>>>
> >>>> As of naming data sources and such. One thing I was wondering about
> >>>> back
> >>>> then is how to deal with list of documents given to a template, versus
> >>>> exactly 1 document given to a template. But I think actually we have
> >>>> no
> >>>> good use case for list of documents that's passed at once to a single
> >>>> template run, so, we can just ignore that complication. A document has
> >>>> a
> >>>> name, and that's always just a single document, not a collection, as
> >>>> far as
> >>>> the template is concerned. (We can have multiple documents per run,
> >>>> but
> >>>> those normally yield separate output generators, so it's still only
> >>>> one
> >>>> document per template.) However, we can have data source types
> >>>> (document
> >>>> types with old terminology) that collect together multiple data files.
> >>>> So
> >>>> then that complexity is encapsulated into the data source type, and
> >>>> doesn't
> >>>> complicate the overall architecture. That's another case when a data
> >>>> source
> >>>> is not just a file. Like maybe there's a data source type that loads
> >>>> all
> >>>> the CSV-s from a directory, into a single big table (I had such case),
> >>>> or
> >>>> even into a list of tables. Or, as I mentioned already, a data source
> >>>> is
> >>>> maybe an SQL query on a JDBC data source (and we got the first term
> >>>> clash... JDBC also call them data sources).
> >>>>
> >>>> Template and document mode probably shouldn't exist from user
> >>>> perspective
> >>>> either, at least not as a global option that must apply to everything
> >>>> in a
> >>>> run. They could just give the files that define the "output
> >>>> generators",
> >>>> and some of them will be templates, some of them are data files, in
> >>>> which
> >>>> case a template need to be associated with them (and there can be a
> >>>> couple
> >>>> of ways of doing that). And then again, there are the cases where you
> >>>> want
> >>>> to create one output generator per entity from some data source.
> >>>>
> >>>> On Fri, Feb 28, 2020 at 8:23 AM Siegfried Goeschl <
> >>>> siegfried.goeschl@gmail.com> wrote:
> >>>>
> >>>>> Hi Daniel,
> >>>>>
> >>>>> See my comments below - and thanks for your patience and input :-)
> >>>>>
> >>>>> *Renaming Document To DataSource*
> >>>>>
> >>>>> Yes, makes sense. I tried to avoid since I'm using javax.activation
> >>>>> and
> >>>>> its DataSource.
> >>>>>
> >>>>> *Template And Document Mode*
> >>>>>
> >>>>> Agreed - I think it is a valuable abstraction for the user but it is
> >>>>> not
> >>>>> an implementation concept :-)
> >>>>>
> >>>>> *Document Without Symbolic Names*
> >>>>>
> >>>>> Also agreed and it is going to change but I have not settled my mind
> >>>>> yet
> >>>>> what exactly to implement.
> >>>>>
> >>>>> Thanks in advance,
> >>>>>
> >>>>> Siegfried Goeschl
> >>>>>
> >>>>> On 28 Feb 2020, at 1:05, Daniel Dekany wrote:
> >>>>>
> >>>>> A few quick thoughts on that:
> >>>>>
> >>>>> - We should replace the "document" term with something more speaking.
> >>>>> It
> >>>>> doesn't tell that it's some kind of input. Also, most of these inputs
> >>>>> aren't something that people typically call documents. Like a csv
> >>>>> file, or
> >>>>> a database table, which is not even a file (OK we don't support such
> >>>>> thing
> >>>>> at the moment). I think, maybe "data source" is a safe enough term.
> >>>>> (It
> >>>>> also rhymes with data model.)
> >>>>> - You have separate "template" and "document" "mode", that applies to
> >>>>> a
> >>>>> whole run. I think such specialization won't be helpful. We could
> >>>>> just say,
> >>>>> on the conceptual level at lest, that we need a set of "outputs
> >>>>> generators". An output generator is an object (in the API) that
> >>>>> specifies a
> >>>>> template, a data-model (where the data-model is possibly populated
> >>>>> with
> >>>>> "documents"), and an output "sink" (a file path, or stdout), and can
> >>>>> generate the output itself. A practical way of defining the output
> >>>>> generators in a CLI application is via a bunch of files, each
> >>>>> defining an
> >>>>> output generator. Some of those files is maybe a template (that you
> >>>>> can
> >>>>> even detect from the file extension), or a data file that we
> >>>>> currently call
> >>>>> a "document". They could freely mix inside the same run. I have also
> >>>>> met
> >>>>> use case when you have a single table (single "document"), and each
> >>>>> record
> >>>>> in it yields an output file. That can also be described in some file
> >>>>> format, or really in any other way, like directly in command line
> >>>>> argument,
> >>>>> via API, etc.
> >>>>> - You have multiple documents without associated symbolical name in
> >>>>> some
> >>>>> examples. Templates can't identify those then in a well maintainable
> >>>>> way.
> >>>>> The actual file name is often not a good identifier, can change over
> >>>>> time,
> >>>>> and you might don't even have good control over it, like you already
> >>>>> receive it as a parameter from somewhere else, or someone
> >>>>> moves/renames
> >>>>> that files that you need to read. Index is also not very good, but I
> >>>>> have
> >>>>> written about that earlier.
> >>>>>
> >>>>>
> >>>>> On Wed, Feb 26, 2020 at 9:33 AM Siegfried Goeschl <
> >>>>> siegfried.goeschl@gmail.com> wrote:
> >>>>>
> >>>>> Hi folks,
> >>>>>
> >>>>> still wrapping my side around but assembled some thoughts here -
> >>>>> https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449
> >>>>>
> >>>>> Thanks in advance,
> >>>>>
> >>>>> Siegfried Goeschl
> >>>>>
> >>>>>
> >>>>> On 23 Feb 2020, at 23:14, Daniel Dekany <dd...@apache.org> wrote:
> >>>>>
> >>>>> What you are describing is more like the angle that FMPP took
> >>>>> initially,
> >>>>> where templates drive things, they generate the output for themselves
> >>>>>
> >>>>> (even
> >>>>>
> >>>>> multiple output files if they wish). By default output files name
> >>>>> (and
> >>>>> relative path) is deduced from template name. There was also a global
> >>>>> data-model, built in a configuration file (or equally, built via
> >>>>> command
> >>>>> line arguments, or both mixed), from which templates get whatever
> >>>>> data
> >>>>>
> >>>>> they
> >>>>>
> >>>>> are interested in. Take a look at the figures here:
> >>>>> http://fmpp.sourceforge.net/qtour.html. Later, this concept was
> >>>>>
> >>>>> generalized
> >>>>>
> >>>>> a bit more, because you could add XML files at the same place where
> >>>>> you
> >>>>> have the templates, and then you could associate transform templates
> >>>>> to
> >>>>>
> >>>>> the
> >>>>>
> >>>>> XML files (based on path pattern and/or the XML document element).
> >>>>> Now
> >>>>> that's like what freemarker-generator had initially (data files drive
> >>>>> output, and the template is there to transform it).
> >>>>>
> >>>>> So I think the generic mental model would like this:
> >>>>>
> >>>>> 1. You got files that drive the process, let's call them *generator
> >>>>> files* for now. Usually, each generator file yields an output file
> >>>>> (but
> >>>>> maybe even multiple output files, as you might saw in the last
> >>>>> figure).
> >>>>> These generator files can be of many types, like XML, JSON, XLSX (as
> >>>>>
> >>>>> in the
> >>>>>
> >>>>> original freemarker-generator), and even templates (as is the norm in
> >>>>> FMPP). If the file is not a template, then you got a set of
> >>>>> transformer
> >>>>> templates (-t CLI option) in a separate directory, which can be
> >>>>>
> >>>>> associated
> >>>>>
> >>>>> with the generator files base on name patterns, and even based on
> >>>>>
> >>>>> content
> >>>>>
> >>>>> (schema usually). If the generator file is a template (so that's a
> >>>>> positional @Parameter CLI argument that happens to be an *.ftl, and
> >>>>> is
> >>>>>
> >>>>> not
> >>>>>
> >>>>> a template file specified after the "-t" option), then you just
> >>>>> Template.process(...) it, and it prints what the output will be.
> >>>>> 2. You also have a set of variables, the global data-model, that
> >>>>> contains commonly useful stuff, like what you now call parameters
> >>>>> (CLI
> >>>>> -Pname=value), but also maybe data loaded from JSON, XML, etc.. Those
> >>>>>
> >>>>> data
> >>>>>
> >>>>> files aren't "generator files". Templates just use them if they need
> >>>>>
> >>>>> them.
> >>>>>
> >>>>> An important thing here is to reuse the same mechanism to read and
> >>>>>
> >>>>> parse
> >>>>>
> >>>>> those data files, which was used in templates when transforming
> >>>>>
> >>>>> generator
> >>>>>
> >>>>> files. So we need a common format for specifying how to load data
> >>>>>
> >>>>> files.
> >>>>>
> >>>>> That's maybe just FTL that #assigns to the variables, or maybe more
> >>>>> declarative format.
> >>>>>
> >>>>> What I have described in the original post here was a less generic
> >>>>> form
> >>>>>
> >>>>> of
> >>>>>
> >>>>> this, as I tried to be true with the original approach. I though the
> >>>>> proposal will be drastic enough as it is... :) There, the "main"
> >>>>> document
> >>>>> is the "generator file" from point 1, the "-t" template is the
> >>>>> transform
> >>>>> template for the "main" document, and the other named documents
> >>>>> ("users",
> >>>>> "groups") is a poor man's shared data-model from point 2 (together
> >>>>> with
> >>>>> with -PName=value).
> >>>>>
> >>>>> There's further somewhat confusing thing to get right with the
> >>>>> list-of-documents (`DocuentList`, `NamedDocumentLists`) thing though.
> >>>>> In
> >>>>> the model above, as per point 1, if you list multiple data files,
> >>>>> each
> >>>>>
> >>>>> will
> >>>>>
> >>>>> generate a separate output file. So, if you need take in a list of
> >>>>> files
> >>>>>
> >>>>> to
> >>>>>
> >>>>> transform it to a single output file (or at least with a single
> >>>>> transform
> >>>>> template execution), then you have to be explicit about that, as
> >>>>> that's
> >>>>>
> >>>>> not
> >>>>>
> >>>>> the default behavior anymore. But it's still absolutely possible.
> >>>>> Imagine
> >>>>> it as a "list of XLSX-es" is itself like a file format. You need some
> >>>>> CLI
> >>>>> (and Maven config, etc.) syntax to express that, but that shouldn't
> >>>>> be a
> >>>>> big deal.
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Sun, Feb 23, 2020 at 9:43 PM Siegfried Goeschl <
> >>>>> siegfried.goeschl@gmail.com> wrote:
> >>>>>
> >>>>> Hi Daniel,
> >>>>>
> >>>>> Good timing - I was looking at a similar problem from different angle
> >>>>> yesterday (see below)
> >>>>>
> >>>>> Don't have enough time to answer your email in detail now - will do
> >>>>> that
> >>>>> tomorrow evening
> >>>>>
> >>>>> Thanks in advance,
> >>>>>
> >>>>> Siegfried Goeschl
> >>>>>
> >>>>>
> >>>>> ===. START
> >>>>> # FreeMarker CLI Improvement
> >>>>> ## Support Of Multiple Template Files
> >>>>> Currently we support the following combinations
> >>>>>
> >>>>> * Single template and no data files
> >>>>> * Single template and one or more data files
> >>>>>
> >>>>> But we can not support the following use case which is quite typical
> >>>>> in
> >>>>> the cloud
> >>>>>
> >>>>> __Convert multiple templates with a single data file, e.g copying a
> >>>>> directory of configuration files using a JSON configuration file__
> >>>>>
> >>>>> ## Implementation notes
> >>>>> * When we copy a directory we can remove the `ftl`extension on the
> >>>>> fly
> >>>>> * We might need an `exclude` filter for the copy operation
> >>>>> * Initially resolve to a list of template files and process one after
> >>>>> another
> >>>>> * Need to calculate the output file location and extension
> >>>>> * We need to rename the existing command line parameters (see below)
> >>>>> * Do we need multiple include and exclude filter?
> >>>>> * Do we need file versus directory filters?
> >>>>>
> >>>>> ### Command Line Options
> >>>>> ```
> >>>>> --input-encoding : Encoding of the documents
> >>>>> --output-encoding : Encoding of the rendered template
> >>>>> --template-encoding : Encoding of the template
> >>>>> --output : Output file or directory
> >>>>> --include-document : Include pattern for documents
> >>>>> --exclude-document : Exclude pattern for documents
> >>>>> --include-template: Include pattern for templates
> >>>>> --exclude-template : Exclude pattern for templates
> >>>>> ```
> >>>>>
> >>>>> ### Command Line Examples
> >>>>> ```text
> >>>>> # Copy all FTL templates found in "ext/config" to the "/config"
> >>>>>
> >>>>> directory
> >>>>>
> >>>>> using the data from "config.json"
> >>>>>
> >>>>> freemarker-cli -t ./ext/config --include-template *.ftl --o /config
> >>>>>
> >>>>> config.json
> >>>>>
> >>>>> freemarker-cli --template ./ext/config --include-template *.ftl
> >>>>>
> >>>>> --output
> >>>>>
> >>>>> /config config.json
> >>>>>
> >>>>> # Bascically the same using a named document "configuration"
> >>>>> # It might make sense to expose "conf" directly in the FreeMarker
> >>>>> data
> >>>>> model
> >>>>> # It might make sens to allow URIs for loading documents
> >>>>>
> >>>>> freemarker-cli -t ./ext/config/*.ftl -o /config -d
> >>>>>
> >>>>> configuration=config.json
> >>>>>
> >>>>> freemarker-cli --template ./ext/config --include-template *.ftl
> >>>>>
> >>>>> --output
> >>>>>
> >>>>> /config --document configuration=config.json
> >>>>>
> >>>>> freemarker-cli --template ./ext/config --include-template *.ftl
> >>>>>
> >>>>> --output
> >>>>>
> >>>>> /config --document configuration=file:///config.json
> >>>>>
> >>>>> # Bascically the same using an environment variable as named document
> >>>>>
> >>>>> freemarker-cli -t ./ext/config --include-template *.ftl -o /config -d
> >>>>>
> >>>>> configuration=env:///CONFIGURATION
> >>>>>
> >>>>> freemarker-cli --template ./ext/config --include-template *.ftl
> >>>>>
> >>>>> --output
> >>>>>
> >>>>> /config --document configuration=env:///CONFIGURATION
> >>>>> ```
> >>>>> === END
> >>>>>
> >>>>> On 23.02.2020, at 16:37, Daniel Dekany <dd...@apache.org> wrote:
> >>>>>
> >>>>> Input documents is a fundamental concept in freemarker-generator, so
> >>>>> we
> >>>>> should think about that more, and probably refine/rework how it's
> >>>>> done.
> >>>>>
> >>>>> Currently it works like this, with CLI at least.
> >>>>>
> >>>>> freemarker-cli
> >>>>> -t access-report.ftl
> >>>>> somewhere/foo-access-log.csv
> >>>>>
> >>>>> Then in access-report.ftl you have to do something like this:
> >>>>>
> >>>>> <#assign doc = Documents.get(0)>
> >>>>> ... process doc here
> >>>>>
> >>>>> (The more idiomatic Documents[0] won't work. Actually, that lead to a
> >>>>>
> >>>>> funny
> >>>>>
> >>>>> chain of coincidences: It returned the string "D", then
> >>>>>
> >>>>> CSVTool.parse(...)
> >>>>>
> >>>>> happily parsed that to a table with the single column "D", and 0
> >>>>> rows,
> >>>>>
> >>>>> and
> >>>>>
> >>>>> as there were 0 rows, the template didn't run into an error because
> >>>>> row.myExpectedColumn refers to a missing column either, so the
> >>>>> process
> >>>>> finished with success. (: Pretty unlucky for sure. The root was
> >>>>> unintentionally breaking a FreeMarker idiom though; eventually we
> >>>>> will
> >>>>>
> >>>>> have
> >>>>>
> >>>>> to work on those too, but, different topic.)
> >>>>>
> >>>>> However, actually multiple input documents can be passed in:
> >>>>>
> >>>>> freemarker-cli
> >>>>> -t access-report.ftl
> >>>>> somewhere/foo-access-log.csv
> >>>>> somewhere/bar-access-log.csv
> >>>>>
> >>>>> Above template will still work, though then you ignored all but the
> >>>>>
> >>>>> first
> >>>>>
> >>>>> document. So if you expect any number of input documents, you
> >>>>> probably
> >>>>>
> >>>>> will
> >>>>>
> >>>>> have to do this:
> >>>>>
> >>>>> <#list Documents.list as doc>
> >>>>> ... process doc here
> >>>>> </#list>
> >>>>>
> >>>>> (The more idiomatic <#list Documents as doc> won't work; but again,
> >>>>>
> >>>>> those
> >>>>>
> >>>>> we will work out in a different thread.)
> >>>>>
> >>>>>
> >>>>> So, what would be better, in my opinion. I start out from what I
> >>>>> think
> >>>>>
> >>>>> are
> >>>>>
> >>>>> the common uses cases, in decreasing order of frequency. Goal is to
> >>>>>
> >>>>> make
> >>>>>
> >>>>> those less error prone for the users, and simpler to express.
> >>>>>
> >>>>> USE CASE 1
> >>>>>
> >>>>> You have exactly 1 input documents, which is therefore simply "the"
> >>>>> document in the mind of the user. This is probably the typical use
> >>>>>
> >>>>> case,
> >>>>>
> >>>>> but at least the use case users typically start out from when
> >>>>> starting
> >>>>>
> >>>>> the
> >>>>>
> >>>>> work.
> >>>>>
> >>>>> freemarker-cli
> >>>>> -t access-report.ftl
> >>>>> somewhere/foo-access-log.csv
> >>>>>
> >>>>> Then `Documents.get(0)` is not very fitting. Most importantly it's
> >>>>>
> >>>>> error
> >>>>>
> >>>>> prone, because if the user passed in more than 1 documents (can even
> >>>>>
> >>>>> happen
> >>>>>
> >>>>> totally accidentally, like if the user was lazy and used a wildcard
> >>>>>
> >>>>> that
> >>>>>
> >>>>> the shell exploded), the template will silently ignore the rest of
> >>>>> the
> >>>>> documents, and the singe document processed will be practically
> >>>>> picked
> >>>>> randomly. The user might won't notice that and submits a bad report
> >>>>> or
> >>>>>
> >>>>> such.
> >>>>>
> >>>>> I think that in this use case the document should be simply referred
> >>>>> as
> >>>>> `Document` in the template. When you have multiple documents there,
> >>>>> referring to `Document` should be an error, saying that the template
> >>>>>
> >>>>> was
> >>>>>
> >>>>> made to process a single document only.
> >>>>>
> >>>>>
> >>>>> USE CASE 2
> >>>>>
> >>>>> You have multiple input documents, but each has different role
> >>>>>
> >>>>> (different
> >>>>>
> >>>>> schema, maybe different file type). Like, you pass in users.csv and
> >>>>> groups.csv. Each has difference schema, and so you want to access
> >>>>> them
> >>>>> differently, but in the same template.
> >>>>>
> >>>>> freemarker-cli
> >>>>> [...]
> >>>>> --named-document users somewhere/foo-users.csv
> >>>>> --named-document groups somewhere/foo-groups.csv
> >>>>>
> >>>>> Then in the template you could refer to them as:
> >>>>>
> >>>>> `NamedDocuments.users`,
> >>>>>
> >>>>> and `NamedDocuments.groups`.
> >>>>>
> >>>>> Use Case 1, and 2 can be unified into a coherent concept, where
> >>>>>
> >>>>> `Document`
> >>>>>
> >>>>> is just a shorthand for `NamedDocuments.main`. It's called "main"
> >>>>>
> >>>>> because
> >>>>>
> >>>>> that's "the" document the template is about, but then you have to
> >>>>> added
> >>>>> some helper documents, with symbolic names representing their role.
> >>>>>
> >>>>> freemarker-cli
> >>>>> -t access-report.ftl
> >>>>> --document-name=main somewhere/foo-access-log.csv
> >>>>> --document-name=users somewhere/foo-users.csv
> >>>>> --document-name=groups somewhere/foo-groups.csv
> >>>>>
> >>>>> Here, `Document` still works in the template, and it refers to
> >>>>> `somewhere/foo-access-log.csv`. (While omitting --document-name=main
> >>>>>
> >>>>> above
> >>>>>
> >>>>> would be cleaner, I couldn't figure out how to do that with Picocli.
> >>>>> Anyway, for now the point is the concept, which is not specific to
> >>>>>
> >>>>> CLI.)
> >>>>>
> >>>>> USE CASE 3
> >>>>>
> >>>>> Here you have several of the same kind of documents. That has a more
> >>>>> generic sub-use-case, when you have explicitly named documents (like
> >>>>> "users" above), and for some you expect multiple input files.
> >>>>>
> >>>>> freemarker-cli
> >>>>> -t access-report.ftl
> >>>>> --document-name=main somewhere/foo-access-log.csv
> >>>>> somewhere/bar-access-log.csv
> >>>>> --document-name=users somewhere/foo-users.csv
> >>>>> somewhere/bar-users.csv
> >>>>> --document-name=groups somewhere/global-groups.csv
> >>>>>
> >>>>> The template must to be written with this use case in mind, as now it
> >>>>>
> >>>>> has
> >>>>>
> >>>>> #list some of the documents. (I think in practice you hardly ever
> >>>>> want
> >>>>>
> >>>>> to
> >>>>>
> >>>>> get a document by hard coded index. Either you don't know how many
> >>>>> documents you have, so you can't use hard coded indexes, or you do,
> >>>>> and
> >>>>> each index has a specific meaning, but then you should name the
> >>>>>
> >>>>> documents
> >>>>>
> >>>>> instead, as using indexes is error prone, and hard to read.)
> >>>>> Accessing that list of documents in the template, maybe could be done
> >>>>>
> >>>>> like
> >>>>>
> >>>>> this:
> >>>>> - For the "main" documents: `DocumentList`
> >>>>> - For explicitly named documents, like "users":
> >>>>>
> >>>>> `NamedDocumentLists.users`
> >>>>>
> >>>>> SUMMING UP
> >>>>>
> >>>>> To unify all 3 use cases into a coherent concept:
> >>>>> - `NamedDocumentLists.<name>` is the most generic form, and while you
> >>>>>
> >>>>> can
> >>>>>
> >>>>> achieve everything with it, using it requires your template to handle
> >>>>>
> >>>>> the
> >>>>>
> >>>>> most generic case too. So, I think it would be rarely used.
> >>>>> - `DocumentList` is just a shorthand for `NamedDocumentLists.main`.
> >>>>>
> >>>>> It's
> >>>>>
> >>>>> used if you only have one kind of documents (single format and
> >>>>> schema),
> >>>>>
> >>>>> but
> >>>>>
> >>>>> potentially multiple of them.
> >>>>> - `NamedDocuments.<name>` expresses that you expect exactly 1
> >>>>> document
> >>>>>
> >>>>> of
> >>>>>
> >>>>> the given name.
> >>>>> - `Document` is just a shorthand for `NamedDocuments.main`. This is
> >>>>> for
> >>>>>
> >>>>> the
> >>>>>
> >>>>> most natural/frequent use case.
> >>>>>
> >>>>> That's 4 possible ways of accessing your documents, which is a
> >>>>>
> >>>>> trade-off
> >>>>>
> >>>>> for the sake of these:
> >>>>> - Catching CLI (or Maven, etc.) input where the template output
> >>>>> likely
> >>>>>
> >>>>> will
> >>>>>
> >>>>> be wrong. That's only possible if the user can communicate its intent
> >>>>>
> >>>>> in
> >>>>>
> >>>>> the template.
> >>>>> - Users don't need to deal with concepts that are irrelevant in their
> >>>>> concrete use case. Just start with the trivial, `Document`, and later
> >>>>>
> >>>>> if
> >>>>>
> >>>>> the need arises, generalize to named documents, document lists, or
> >>>>>
> >>>>> both.
> >>>>>
> >>>>> What do guys think?
> >>>>>
> >>>>>
> >>>
> >>
> >>
> >> --
> >> Best regards,
> >> Daniel Dekany
> >>
> >
> >
> > --
> > Best regards,
> > Daniel Dekany
>
>

-- 
Best regards,
Daniel Dekany

Re: freemarker-generator: Improving the input documents concept

Posted by Siegfried Goeschl <si...@gmail.com>.

Hi Daniel,

I'm an enterprise developer - bad habits die hard :-)

So I closed the following tickets and merged the branches

1) FREEMARKER-129 freemarker-generator: Merge "freemarker-cli" into "freemarker-generator"
2) FREEMARKER-134 freemarker-generator: Rename "Document" to "Datasource"
3) FREEMARKER-135 freemarker-generator-cli: Support user-supplied names for datasources

Thanks in advance, 

Siegfried Goeschl


> On 29.02.2020, at 12:19, Daniel Dekany <da...@gmail.com> wrote:
> 
> Yeah, and of course, you can merge that branch. You can even work on the
> master directly after all.
> 
> On Sat, Feb 29, 2020 at 12:17 PM Daniel Dekany <da...@gmail.com>
> wrote:
> 
>> But, I do recognize the cattle use case (several "faceless" files with
>> common format/schema). Only, my idea is to push that complexity on the data
>> source. The "data source" concept shields the rest of the application from
>> the details of how the data is stored or retrieved. So, a data source might
>> loads a bunch of log files from a directory, and present them as a single
>> big table, or like a list of tables, etc. So I want to deal with the cattle
>> use case, but the question is what part of the of architecture will deal
>> with this complication, with other words, how do you box things. Why my
>> initial bet is to stuff that complication into the "data source"
>> implementation(s) is that data sources are inherently varied. Some returns
>> a table-like thing, some have multiple named tables (worksheets in Excel),
>> some returns tree of nodes (XML), etc. So then, some might returns a
>> list-of-list-of log records, or just a single list of log-records (put
>> together from daily log files). That way cattles don't add to conceptual
>> complexity. Now, you might be aware of cases where the cattle concept must
>> be more exposed than this, and the we can't box things like this. But this
>> is what I tried to express.
>> 
>> Regarding "output generators", and how that applies on the command line. I
>> think it's important that the common core between Maven and command-line is
>> as fat as possible. Ideally, they are just two syntax to set up the same
>> thing. Mostly at least. So, if you specify a template file to the CLI
>> application, in a way so that it causes it to process that template to
>> generate a single output, then there you have just defined an "output
>> generator" (even if it wasn't explicitly called like that in the command
>> line). If you specify 3 csv files to the CLI application, in a way so that
>> it causes it to generate 3 output files, then you have just defined 3
>> "output generators" there (there's at least one template specified there
>> too, but that wasn't an "output generator" itself, it was just an attribute
>> of the 3 output generators). If you specify 1 template, and 3 csv files, in
>> a way so that it will yield 4 output files (1 for the template, 3 for the
>> csv-s), then you have defined 4 output generators there. If you have a data
>> source that loads a list of 3 entities (say, 3 csv files, so it's a list of
>> tables then), and you have 2 templates, and you tell the CLI to execute
>> each template for each item in said data source, then you have just defined
>> 6 "output generators".
>> 
>> On Fri, Feb 28, 2020 at 11:08 AM Siegfried Goeschl <
>> siegfried.goeschl@gmail.com> wrote:
>> 
>>> Hi Daniel,
>>> 
>>> That all depends on your mental model and work you do, expectations,
>>> experience :-)
>>> 
>>> 
>>> __Document Handling__
>>> 
>>> *"But I think actually we have no good use case for list of documents
>>> that's passed at once to a single template run, so, we can just ignore
>>> that complication"*
>>> 
>>> In my case that's not a complication but my daily business - I'm
>>> regularly wading through access logs - yesterday probably a couple of
>>> hundreds access logs across two staging sites to help tracking some
>>> strange API gateway issues :-)
>>> 
>>> My gut feeling is (borrowing from
>>> 
>>> https://medium.com/@Joachim8675309/devops-concepts-pets-vs-cattle-2380b5aab313
>>> )
>>> 
>>> 1. You have a few lovely named documents / templates - `pets`
>>> 2. You have tons of anonymous documents / templates to process -
>>> `cattle`
>>> 3. The "grey area" comes into play when mixing `pets & cattle`
>>> 
>>> `freemarker-cli` was built with 2) in mind and I want to cover 1) since
>>> it is equally important and common.
>>> 
>>> 
>>> __Template And Document Processing Modes__
>>> 
>>> IMHO it is important to answer the following question : "How many
>>> outputs do you get when rendering 2 template and 3 datasources? Two,
>>> Three or Six?"
>>> 
>>> Your answer is influenced by your mental model / experience
>>> 
>>> * When wading through tons of CSV files, access logs, etc. the answer is
>>> "2"
>>> * When doing source code generation the obvious answer is "6"
>>> * Can't image a use case which results in "3" but I'm pretty sure we
>>> will encounter one
>>> 
>>> __Template and document mode probably shouldn't exist__
>>> 
>>> That's hard for me to fully understand - I definitely lack your insights
>>> & experience writing such tools :-)
>>> 
>>> Defining the `Output Generator` is the underlying model for the Maven
>>> plugin (and probably FMPP).
>>> 
>>> I'm not sure if this applies for command lines at least not in the way I
>>> use them (or would like to use them)
>>> 
>>> 
>>> Thanks in advance,
>>> 
>>> Siegfried Goeschl
>>> 
>>> PS: Can/shall I merge the PR to bring in `freemarker-cli`?
>>> 
>>> 
>>> On 28 Feb 2020, at 9:14, Daniel Dekany wrote:
>>> 
>>>> Yeah, "data source" is surely a too popular name, but for reason.
>>>> Anyone
>>>> has other ideas?
>>>> 
>>>> As of naming data sources and such. One thing I was wondering about
>>>> back
>>>> then is how to deal with list of documents given to a template, versus
>>>> exactly 1 document given to a template. But I think actually we have
>>>> no
>>>> good use case for list of documents that's passed at once to a single
>>>> template run, so, we can just ignore that complication. A document has
>>>> a
>>>> name, and that's always just a single document, not a collection, as
>>>> far as
>>>> the template is concerned. (We can have multiple documents per run,
>>>> but
>>>> those normally yield separate output generators, so it's still only
>>>> one
>>>> document per template.) However, we can have data source types
>>>> (document
>>>> types with old terminology) that collect together multiple data files.
>>>> So
>>>> then that complexity is encapsulated into the data source type, and
>>>> doesn't
>>>> complicate the overall architecture. That's another case when a data
>>>> source
>>>> is not just a file. Like maybe there's a data source type that loads
>>>> all
>>>> the CSV-s from a directory, into a single big table (I had such case),
>>>> or
>>>> even into a list of tables. Or, as I mentioned already, a data source
>>>> is
>>>> maybe an SQL query on a JDBC data source (and we got the first term
>>>> clash... JDBC also call them data sources).
>>>> 
>>>> Template and document mode probably shouldn't exist from user
>>>> perspective
>>>> either, at least not as a global option that must apply to everything
>>>> in a
>>>> run. They could just give the files that define the "output
>>>> generators",
>>>> and some of them will be templates, some of them are data files, in
>>>> which
>>>> case a template need to be associated with them (and there can be a
>>>> couple
>>>> of ways of doing that). And then again, there are the cases where you
>>>> want
>>>> to create one output generator per entity from some data source.
>>>> 
>>>> On Fri, Feb 28, 2020 at 8:23 AM Siegfried Goeschl <
>>>> siegfried.goeschl@gmail.com> wrote:
>>>> 
>>>>> Hi Daniel,
>>>>> 
>>>>> See my comments below - and thanks for your patience and input :-)
>>>>> 
>>>>> *Renaming Document To DataSource*
>>>>> 
>>>>> Yes, makes sense. I tried to avoid since I'm using javax.activation
>>>>> and
>>>>> its DataSource.
>>>>> 
>>>>> *Template And Document Mode*
>>>>> 
>>>>> Agreed - I think it is a valuable abstraction for the user but it is
>>>>> not
>>>>> an implementation concept :-)
>>>>> 
>>>>> *Document Without Symbolic Names*
>>>>> 
>>>>> Also agreed and it is going to change but I have not settled my mind
>>>>> yet
>>>>> what exactly to implement.
>>>>> 
>>>>> Thanks in advance,
>>>>> 
>>>>> Siegfried Goeschl
>>>>> 
>>>>> On 28 Feb 2020, at 1:05, Daniel Dekany wrote:
>>>>> 
>>>>> A few quick thoughts on that:
>>>>> 
>>>>> - We should replace the "document" term with something more speaking.
>>>>> It
>>>>> doesn't tell that it's some kind of input. Also, most of these inputs
>>>>> aren't something that people typically call documents. Like a csv
>>>>> file, or
>>>>> a database table, which is not even a file (OK we don't support such
>>>>> thing
>>>>> at the moment). I think, maybe "data source" is a safe enough term.
>>>>> (It
>>>>> also rhymes with data model.)
>>>>> - You have separate "template" and "document" "mode", that applies to
>>>>> a
>>>>> whole run. I think such specialization won't be helpful. We could
>>>>> just say,
>>>>> on the conceptual level at lest, that we need a set of "outputs
>>>>> generators". An output generator is an object (in the API) that
>>>>> specifies a
>>>>> template, a data-model (where the data-model is possibly populated
>>>>> with
>>>>> "documents"), and an output "sink" (a file path, or stdout), and can
>>>>> generate the output itself. A practical way of defining the output
>>>>> generators in a CLI application is via a bunch of files, each
>>>>> defining an
>>>>> output generator. Some of those files is maybe a template (that you
>>>>> can
>>>>> even detect from the file extension), or a data file that we
>>>>> currently call
>>>>> a "document". They could freely mix inside the same run. I have also
>>>>> met
>>>>> use case when you have a single table (single "document"), and each
>>>>> record
>>>>> in it yields an output file. That can also be described in some file
>>>>> format, or really in any other way, like directly in command line
>>>>> argument,
>>>>> via API, etc.
>>>>> - You have multiple documents without associated symbolical name in
>>>>> some
>>>>> examples. Templates can't identify those then in a well maintainable
>>>>> way.
>>>>> The actual file name is often not a good identifier, can change over
>>>>> time,
>>>>> and you might don't even have good control over it, like you already
>>>>> receive it as a parameter from somewhere else, or someone
>>>>> moves/renames
>>>>> that files that you need to read. Index is also not very good, but I
>>>>> have
>>>>> written about that earlier.
>>>>> 
>>>>> 
>>>>> On Wed, Feb 26, 2020 at 9:33 AM Siegfried Goeschl <
>>>>> siegfried.goeschl@gmail.com> wrote:
>>>>> 
>>>>> Hi folks,
>>>>> 
>>>>> still wrapping my side around but assembled some thoughts here -
>>>>> https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449
>>>>> 
>>>>> Thanks in advance,
>>>>> 
>>>>> Siegfried Goeschl
>>>>> 
>>>>> 
>>>>> On 23 Feb 2020, at 23:14, Daniel Dekany <dd...@apache.org> wrote:
>>>>> 
>>>>> What you are describing is more like the angle that FMPP took
>>>>> initially,
>>>>> where templates drive things, they generate the output for themselves
>>>>> 
>>>>> (even
>>>>> 
>>>>> multiple output files if they wish). By default output files name
>>>>> (and
>>>>> relative path) is deduced from template name. There was also a global
>>>>> data-model, built in a configuration file (or equally, built via
>>>>> command
>>>>> line arguments, or both mixed), from which templates get whatever
>>>>> data
>>>>> 
>>>>> they
>>>>> 
>>>>> are interested in. Take a look at the figures here:
>>>>> http://fmpp.sourceforge.net/qtour.html. Later, this concept was
>>>>> 
>>>>> generalized
>>>>> 
>>>>> a bit more, because you could add XML files at the same place where
>>>>> you
>>>>> have the templates, and then you could associate transform templates
>>>>> to
>>>>> 
>>>>> the
>>>>> 
>>>>> XML files (based on path pattern and/or the XML document element).
>>>>> Now
>>>>> that's like what freemarker-generator had initially (data files drive
>>>>> output, and the template is there to transform it).
>>>>> 
>>>>> So I think the generic mental model would like this:
>>>>> 
>>>>> 1. You got files that drive the process, let's call them *generator
>>>>> files* for now. Usually, each generator file yields an output file
>>>>> (but
>>>>> maybe even multiple output files, as you might saw in the last
>>>>> figure).
>>>>> These generator files can be of many types, like XML, JSON, XLSX (as
>>>>> 
>>>>> in the
>>>>> 
>>>>> original freemarker-generator), and even templates (as is the norm in
>>>>> FMPP). If the file is not a template, then you got a set of
>>>>> transformer
>>>>> templates (-t CLI option) in a separate directory, which can be
>>>>> 
>>>>> associated
>>>>> 
>>>>> with the generator files base on name patterns, and even based on
>>>>> 
>>>>> content
>>>>> 
>>>>> (schema usually). If the generator file is a template (so that's a
>>>>> positional @Parameter CLI argument that happens to be an *.ftl, and
>>>>> is
>>>>> 
>>>>> not
>>>>> 
>>>>> a template file specified after the "-t" option), then you just
>>>>> Template.process(...) it, and it prints what the output will be.
>>>>> 2. You also have a set of variables, the global data-model, that
>>>>> contains commonly useful stuff, like what you now call parameters
>>>>> (CLI
>>>>> -Pname=value), but also maybe data loaded from JSON, XML, etc.. Those
>>>>> 
>>>>> data
>>>>> 
>>>>> files aren't "generator files". Templates just use them if they need
>>>>> 
>>>>> them.
>>>>> 
>>>>> An important thing here is to reuse the same mechanism to read and
>>>>> 
>>>>> parse
>>>>> 
>>>>> those data files, which was used in templates when transforming
>>>>> 
>>>>> generator
>>>>> 
>>>>> files. So we need a common format for specifying how to load data
>>>>> 
>>>>> files.
>>>>> 
>>>>> That's maybe just FTL that #assigns to the variables, or maybe more
>>>>> declarative format.
>>>>> 
>>>>> What I have described in the original post here was a less generic
>>>>> form
>>>>> 
>>>>> of
>>>>> 
>>>>> this, as I tried to be true with the original approach. I though the
>>>>> proposal will be drastic enough as it is... :) There, the "main"
>>>>> document
>>>>> is the "generator file" from point 1, the "-t" template is the
>>>>> transform
>>>>> template for the "main" document, and the other named documents
>>>>> ("users",
>>>>> "groups") is a poor man's shared data-model from point 2 (together
>>>>> with
>>>>> with -PName=value).
>>>>> 
>>>>> There's further somewhat confusing thing to get right with the
>>>>> list-of-documents (`DocuentList`, `NamedDocumentLists`) thing though.
>>>>> In
>>>>> the model above, as per point 1, if you list multiple data files,
>>>>> each
>>>>> 
>>>>> will
>>>>> 
>>>>> generate a separate output file. So, if you need take in a list of
>>>>> files
>>>>> 
>>>>> to
>>>>> 
>>>>> transform it to a single output file (or at least with a single
>>>>> transform
>>>>> template execution), then you have to be explicit about that, as
>>>>> that's
>>>>> 
>>>>> not
>>>>> 
>>>>> the default behavior anymore. But it's still absolutely possible.
>>>>> Imagine
>>>>> it as a "list of XLSX-es" is itself like a file format. You need some
>>>>> CLI
>>>>> (and Maven config, etc.) syntax to express that, but that shouldn't
>>>>> be a
>>>>> big deal.
>>>>> 
>>>>> 
>>>>> 
>>>>> On Sun, Feb 23, 2020 at 9:43 PM Siegfried Goeschl <
>>>>> siegfried.goeschl@gmail.com> wrote:
>>>>> 
>>>>> Hi Daniel,
>>>>> 
>>>>> Good timing - I was looking at a similar problem from different angle
>>>>> yesterday (see below)
>>>>> 
>>>>> Don't have enough time to answer your email in detail now - will do
>>>>> that
>>>>> tomorrow evening
>>>>> 
>>>>> Thanks in advance,
>>>>> 
>>>>> Siegfried Goeschl
>>>>> 
>>>>> 
>>>>> ===. START
>>>>> # FreeMarker CLI Improvement
>>>>> ## Support Of Multiple Template Files
>>>>> Currently we support the following combinations
>>>>> 
>>>>> * Single template and no data files
>>>>> * Single template and one or more data files
>>>>> 
>>>>> But we can not support the following use case which is quite typical
>>>>> in
>>>>> the cloud
>>>>> 
>>>>> __Convert multiple templates with a single data file, e.g copying a
>>>>> directory of configuration files using a JSON configuration file__
>>>>> 
>>>>> ## Implementation notes
>>>>> * When we copy a directory we can remove the `ftl`extension on the
>>>>> fly
>>>>> * We might need an `exclude` filter for the copy operation
>>>>> * Initially resolve to a list of template files and process one after
>>>>> another
>>>>> * Need to calculate the output file location and extension
>>>>> * We need to rename the existing command line parameters (see below)
>>>>> * Do we need multiple include and exclude filter?
>>>>> * Do we need file versus directory filters?
>>>>> 
>>>>> ### Command Line Options
>>>>> ```
>>>>> --input-encoding : Encoding of the documents
>>>>> --output-encoding : Encoding of the rendered template
>>>>> --template-encoding : Encoding of the template
>>>>> --output : Output file or directory
>>>>> --include-document : Include pattern for documents
>>>>> --exclude-document : Exclude pattern for documents
>>>>> --include-template: Include pattern for templates
>>>>> --exclude-template : Exclude pattern for templates
>>>>> ```
>>>>> 
>>>>> ### Command Line Examples
>>>>> ```text
>>>>> # Copy all FTL templates found in "ext/config" to the "/config"
>>>>> 
>>>>> directory
>>>>> 
>>>>> using the data from "config.json"
>>>>> 
>>>>> freemarker-cli -t ./ext/config --include-template *.ftl --o /config
>>>>> 
>>>>> config.json
>>>>> 
>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>>>>> 
>>>>> --output
>>>>> 
>>>>> /config config.json
>>>>> 
>>>>> # Bascically the same using a named document "configuration"
>>>>> # It might make sense to expose "conf" directly in the FreeMarker
>>>>> data
>>>>> model
>>>>> # It might make sens to allow URIs for loading documents
>>>>> 
>>>>> freemarker-cli -t ./ext/config/*.ftl -o /config -d
>>>>> 
>>>>> configuration=config.json
>>>>> 
>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>>>>> 
>>>>> --output
>>>>> 
>>>>> /config --document configuration=config.json
>>>>> 
>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>>>>> 
>>>>> --output
>>>>> 
>>>>> /config --document configuration=file:///config.json
>>>>> 
>>>>> # Bascically the same using an environment variable as named document
>>>>> 
>>>>> freemarker-cli -t ./ext/config --include-template *.ftl -o /config -d
>>>>> 
>>>>> configuration=env:///CONFIGURATION
>>>>> 
>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>>>>> 
>>>>> --output
>>>>> 
>>>>> /config --document configuration=env:///CONFIGURATION
>>>>> ```
>>>>> === END
>>>>> 
>>>>> On 23.02.2020, at 16:37, Daniel Dekany <dd...@apache.org> wrote:
>>>>> 
>>>>> Input documents is a fundamental concept in freemarker-generator, so
>>>>> we
>>>>> should think about that more, and probably refine/rework how it's
>>>>> done.
>>>>> 
>>>>> Currently it works like this, with CLI at least.
>>>>> 
>>>>> freemarker-cli
>>>>> -t access-report.ftl
>>>>> somewhere/foo-access-log.csv
>>>>> 
>>>>> Then in access-report.ftl you have to do something like this:
>>>>> 
>>>>> <#assign doc = Documents.get(0)>
>>>>> ... process doc here
>>>>> 
>>>>> (The more idiomatic Documents[0] won't work. Actually, that lead to a
>>>>> 
>>>>> funny
>>>>> 
>>>>> chain of coincidences: It returned the string "D", then
>>>>> 
>>>>> CSVTool.parse(...)
>>>>> 
>>>>> happily parsed that to a table with the single column "D", and 0
>>>>> rows,
>>>>> 
>>>>> and
>>>>> 
>>>>> as there were 0 rows, the template didn't run into an error because
>>>>> row.myExpectedColumn refers to a missing column either, so the
>>>>> process
>>>>> finished with success. (: Pretty unlucky for sure. The root was
>>>>> unintentionally breaking a FreeMarker idiom though; eventually we
>>>>> will
>>>>> 
>>>>> have
>>>>> 
>>>>> to work on those too, but, different topic.)
>>>>> 
>>>>> However, actually multiple input documents can be passed in:
>>>>> 
>>>>> freemarker-cli
>>>>> -t access-report.ftl
>>>>> somewhere/foo-access-log.csv
>>>>> somewhere/bar-access-log.csv
>>>>> 
>>>>> Above template will still work, though then you ignored all but the
>>>>> 
>>>>> first
>>>>> 
>>>>> document. So if you expect any number of input documents, you
>>>>> probably
>>>>> 
>>>>> will
>>>>> 
>>>>> have to do this:
>>>>> 
>>>>> <#list Documents.list as doc>
>>>>> ... process doc here
>>>>> </#list>
>>>>> 
>>>>> (The more idiomatic <#list Documents as doc> won't work; but again,
>>>>> 
>>>>> those
>>>>> 
>>>>> we will work out in a different thread.)
>>>>> 
>>>>> 
>>>>> So, what would be better, in my opinion. I start out from what I
>>>>> think
>>>>> 
>>>>> are
>>>>> 
>>>>> the common uses cases, in decreasing order of frequency. Goal is to
>>>>> 
>>>>> make
>>>>> 
>>>>> those less error prone for the users, and simpler to express.
>>>>> 
>>>>> USE CASE 1
>>>>> 
>>>>> You have exactly 1 input documents, which is therefore simply "the"
>>>>> document in the mind of the user. This is probably the typical use
>>>>> 
>>>>> case,
>>>>> 
>>>>> but at least the use case users typically start out from when
>>>>> starting
>>>>> 
>>>>> the
>>>>> 
>>>>> work.
>>>>> 
>>>>> freemarker-cli
>>>>> -t access-report.ftl
>>>>> somewhere/foo-access-log.csv
>>>>> 
>>>>> Then `Documents.get(0)` is not very fitting. Most importantly it's
>>>>> 
>>>>> error
>>>>> 
>>>>> prone, because if the user passed in more than 1 documents (can even
>>>>> 
>>>>> happen
>>>>> 
>>>>> totally accidentally, like if the user was lazy and used a wildcard
>>>>> 
>>>>> that
>>>>> 
>>>>> the shell exploded), the template will silently ignore the rest of
>>>>> the
>>>>> documents, and the singe document processed will be practically
>>>>> picked
>>>>> randomly. The user might won't notice that and submits a bad report
>>>>> or
>>>>> 
>>>>> such.
>>>>> 
>>>>> I think that in this use case the document should be simply referred
>>>>> as
>>>>> `Document` in the template. When you have multiple documents there,
>>>>> referring to `Document` should be an error, saying that the template
>>>>> 
>>>>> was
>>>>> 
>>>>> made to process a single document only.
>>>>> 
>>>>> 
>>>>> USE CASE 2
>>>>> 
>>>>> You have multiple input documents, but each has different role
>>>>> 
>>>>> (different
>>>>> 
>>>>> schema, maybe different file type). Like, you pass in users.csv and
>>>>> groups.csv. Each has difference schema, and so you want to access
>>>>> them
>>>>> differently, but in the same template.
>>>>> 
>>>>> freemarker-cli
>>>>> [...]
>>>>> --named-document users somewhere/foo-users.csv
>>>>> --named-document groups somewhere/foo-groups.csv
>>>>> 
>>>>> Then in the template you could refer to them as:
>>>>> 
>>>>> `NamedDocuments.users`,
>>>>> 
>>>>> and `NamedDocuments.groups`.
>>>>> 
>>>>> Use Case 1, and 2 can be unified into a coherent concept, where
>>>>> 
>>>>> `Document`
>>>>> 
>>>>> is just a shorthand for `NamedDocuments.main`. It's called "main"
>>>>> 
>>>>> because
>>>>> 
>>>>> that's "the" document the template is about, but then you have to
>>>>> added
>>>>> some helper documents, with symbolic names representing their role.
>>>>> 
>>>>> freemarker-cli
>>>>> -t access-report.ftl
>>>>> --document-name=main somewhere/foo-access-log.csv
>>>>> --document-name=users somewhere/foo-users.csv
>>>>> --document-name=groups somewhere/foo-groups.csv
>>>>> 
>>>>> Here, `Document` still works in the template, and it refers to
>>>>> `somewhere/foo-access-log.csv`. (While omitting --document-name=main
>>>>> 
>>>>> above
>>>>> 
>>>>> would be cleaner, I couldn't figure out how to do that with Picocli.
>>>>> Anyway, for now the point is the concept, which is not specific to
>>>>> 
>>>>> CLI.)
>>>>> 
>>>>> USE CASE 3
>>>>> 
>>>>> Here you have several of the same kind of documents. That has a more
>>>>> generic sub-use-case, when you have explicitly named documents (like
>>>>> "users" above), and for some you expect multiple input files.
>>>>> 
>>>>> freemarker-cli
>>>>> -t access-report.ftl
>>>>> --document-name=main somewhere/foo-access-log.csv
>>>>> somewhere/bar-access-log.csv
>>>>> --document-name=users somewhere/foo-users.csv
>>>>> somewhere/bar-users.csv
>>>>> --document-name=groups somewhere/global-groups.csv
>>>>> 
>>>>> The template must to be written with this use case in mind, as now it
>>>>> 
>>>>> has
>>>>> 
>>>>> #list some of the documents. (I think in practice you hardly ever
>>>>> want
>>>>> 
>>>>> to
>>>>> 
>>>>> get a document by hard coded index. Either you don't know how many
>>>>> documents you have, so you can't use hard coded indexes, or you do,
>>>>> and
>>>>> each index has a specific meaning, but then you should name the
>>>>> 
>>>>> documents
>>>>> 
>>>>> instead, as using indexes is error prone, and hard to read.)
>>>>> Accessing that list of documents in the template, maybe could be done
>>>>> 
>>>>> like
>>>>> 
>>>>> this:
>>>>> - For the "main" documents: `DocumentList`
>>>>> - For explicitly named documents, like "users":
>>>>> 
>>>>> `NamedDocumentLists.users`
>>>>> 
>>>>> SUMMING UP
>>>>> 
>>>>> To unify all 3 use cases into a coherent concept:
>>>>> - `NamedDocumentLists.<name>` is the most generic form, and while you
>>>>> 
>>>>> can
>>>>> 
>>>>> achieve everything with it, using it requires your template to handle
>>>>> 
>>>>> the
>>>>> 
>>>>> most generic case too. So, I think it would be rarely used.
>>>>> - `DocumentList` is just a shorthand for `NamedDocumentLists.main`.
>>>>> 
>>>>> It's
>>>>> 
>>>>> used if you only have one kind of documents (single format and
>>>>> schema),
>>>>> 
>>>>> but
>>>>> 
>>>>> potentially multiple of them.
>>>>> - `NamedDocuments.<name>` expresses that you expect exactly 1
>>>>> document
>>>>> 
>>>>> of
>>>>> 
>>>>> the given name.
>>>>> - `Document` is just a shorthand for `NamedDocuments.main`. This is
>>>>> for
>>>>> 
>>>>> the
>>>>> 
>>>>> most natural/frequent use case.
>>>>> 
>>>>> That's 4 possible ways of accessing your documents, which is a
>>>>> 
>>>>> trade-off
>>>>> 
>>>>> for the sake of these:
>>>>> - Catching CLI (or Maven, etc.) input where the template output
>>>>> likely
>>>>> 
>>>>> will
>>>>> 
>>>>> be wrong. That's only possible if the user can communicate its intent
>>>>> 
>>>>> in
>>>>> 
>>>>> the template.
>>>>> - Users don't need to deal with concepts that are irrelevant in their
>>>>> concrete use case. Just start with the trivial, `Document`, and later
>>>>> 
>>>>> if
>>>>> 
>>>>> the need arises, generalize to named documents, document lists, or
>>>>> 
>>>>> both.
>>>>> 
>>>>> What do guys think?
>>>>> 
>>>>> 
>>> 
>> 
>> 
>> --
>> Best regards,
>> Daniel Dekany
>> 
> 
> 
> -- 
> Best regards,
> Daniel Dekany

Re: freemarker-generator: Improving the input documents concept

Posted by Daniel Dekany <da...@gmail.com>.

Yeah, and of course, you can merge that branch. You can even work on the
master directly after all.

On Sat, Feb 29, 2020 at 12:17 PM Daniel Dekany <da...@gmail.com>
wrote:

> But, I do recognize the cattle use case (several "faceless" files with
> common format/schema). Only, my idea is to push that complexity on the data
> source. The "data source" concept shields the rest of the application from
> the details of how the data is stored or retrieved. So, a data source might
> loads a bunch of log files from a directory, and present them as a single
> big table, or like a list of tables, etc. So I want to deal with the cattle
> use case, but the question is what part of the of architecture will deal
> with this complication, with other words, how do you box things. Why my
> initial bet is to stuff that complication into the "data source"
> implementation(s) is that data sources are inherently varied. Some returns
> a table-like thing, some have multiple named tables (worksheets in Excel),
> some returns tree of nodes (XML), etc. So then, some might returns a
> list-of-list-of log records, or just a single list of log-records (put
> together from daily log files). That way cattles don't add to conceptual
> complexity. Now, you might be aware of cases where the cattle concept must
> be more exposed than this, and the we can't box things like this. But this
> is what I tried to express.
>
> Regarding "output generators", and how that applies on the command line. I
> think it's important that the common core between Maven and command-line is
> as fat as possible. Ideally, they are just two syntax to set up the same
> thing. Mostly at least. So, if you specify a template file to the CLI
> application, in a way so that it causes it to process that template to
> generate a single output, then there you have just defined an "output
> generator" (even if it wasn't explicitly called like that in the command
> line). If you specify 3 csv files to the CLI application, in a way so that
> it causes it to generate 3 output files, then you have just defined 3
> "output generators" there (there's at least one template specified there
> too, but that wasn't an "output generator" itself, it was just an attribute
> of the 3 output generators). If you specify 1 template, and 3 csv files, in
> a way so that it will yield 4 output files (1 for the template, 3 for the
> csv-s), then you have defined 4 output generators there. If you have a data
> source that loads a list of 3 entities (say, 3 csv files, so it's a list of
> tables then), and you have 2 templates, and you tell the CLI to execute
> each template for each item in said data source, then you have just defined
> 6 "output generators".
>
> On Fri, Feb 28, 2020 at 11:08 AM Siegfried Goeschl <
> siegfried.goeschl@gmail.com> wrote:
>
>> Hi Daniel,
>>
>> That all depends on your mental model and work you do, expectations,
>> experience :-)
>>
>>
>> __Document Handling__
>>
>> *"But I think actually we have no good use case for list of documents
>> that's passed at once to a single template run, so, we can just ignore
>> that complication"*
>>
>> In my case that's not a complication but my daily business - I'm
>> regularly wading through access logs - yesterday probably a couple of
>> hundreds access logs across two staging sites to help tracking some
>> strange API gateway issues :-)
>>
>> My gut feeling is (borrowing from
>>
>> https://medium.com/@Joachim8675309/devops-concepts-pets-vs-cattle-2380b5aab313
>> )
>>
>> 1. You have a few lovely named documents / templates - `pets`
>> 2. You have tons of anonymous documents / templates to process -
>> `cattle`
>> 3. The "grey area" comes into play when mixing `pets & cattle`
>>
>> `freemarker-cli` was built with 2) in mind and I want to cover 1) since
>> it is equally important and common.
>>
>>
>> __Template And Document Processing Modes__
>>
>> IMHO it is important to answer the following question : "How many
>> outputs do you get when rendering 2 template and 3 datasources? Two,
>> Three or Six?"
>>
>> Your answer is influenced by your mental model / experience
>>
>> * When wading through tons of CSV files, access logs, etc. the answer is
>> "2"
>> * When doing source code generation the obvious answer is "6"
>> * Can't image a use case which results in "3" but I'm pretty sure we
>> will encounter one
>>
>> __Template and document mode probably shouldn't exist__
>>
>> That's hard for me to fully understand - I definitely lack your insights
>> & experience writing such tools :-)
>>
>> Defining the `Output Generator` is the underlying model for the Maven
>> plugin (and probably FMPP).
>>
>> I'm not sure if this applies for command lines at least not in the way I
>> use them (or would like to use them)
>>
>>
>> Thanks in advance,
>>
>> Siegfried Goeschl
>>
>> PS: Can/shall I merge the PR to bring in `freemarker-cli`?
>>
>>
>> On 28 Feb 2020, at 9:14, Daniel Dekany wrote:
>>
>> > Yeah, "data source" is surely a too popular name, but for reason.
>> > Anyone
>> > has other ideas?
>> >
>> > As of naming data sources and such. One thing I was wondering about
>> > back
>> > then is how to deal with list of documents given to a template, versus
>> > exactly 1 document given to a template. But I think actually we have
>> > no
>> > good use case for list of documents that's passed at once to a single
>> > template run, so, we can just ignore that complication. A document has
>> > a
>> > name, and that's always just a single document, not a collection, as
>> > far as
>> > the template is concerned. (We can have multiple documents per run,
>> > but
>> > those normally yield separate output generators, so it's still only
>> > one
>> > document per template.) However, we can have data source types
>> > (document
>> > types with old terminology) that collect together multiple data files.
>> > So
>> > then that complexity is encapsulated into the data source type, and
>> > doesn't
>> > complicate the overall architecture. That's another case when a data
>> > source
>> > is not just a file. Like maybe there's a data source type that loads
>> > all
>> > the CSV-s from a directory, into a single big table (I had such case),
>> > or
>> > even into a list of tables. Or, as I mentioned already, a data source
>> > is
>> > maybe an SQL query on a JDBC data source (and we got the first term
>> > clash... JDBC also call them data sources).
>> >
>> > Template and document mode probably shouldn't exist from user
>> > perspective
>> > either, at least not as a global option that must apply to everything
>> > in a
>> > run. They could just give the files that define the "output
>> > generators",
>> > and some of them will be templates, some of them are data files, in
>> > which
>> > case a template need to be associated with them (and there can be a
>> > couple
>> > of ways of doing that). And then again, there are the cases where you
>> > want
>> > to create one output generator per entity from some data source.
>> >
>> > On Fri, Feb 28, 2020 at 8:23 AM Siegfried Goeschl <
>> > siegfried.goeschl@gmail.com> wrote:
>> >
>> >> Hi Daniel,
>> >>
>> >> See my comments below - and thanks for your patience and input :-)
>> >>
>> >> *Renaming Document To DataSource*
>> >>
>> >> Yes, makes sense. I tried to avoid since I'm using javax.activation
>> >> and
>> >> its DataSource.
>> >>
>> >> *Template And Document Mode*
>> >>
>> >> Agreed - I think it is a valuable abstraction for the user but it is
>> >> not
>> >> an implementation concept :-)
>> >>
>> >> *Document Without Symbolic Names*
>> >>
>> >> Also agreed and it is going to change but I have not settled my mind
>> >> yet
>> >> what exactly to implement.
>> >>
>> >> Thanks in advance,
>> >>
>> >> Siegfried Goeschl
>> >>
>> >> On 28 Feb 2020, at 1:05, Daniel Dekany wrote:
>> >>
>> >> A few quick thoughts on that:
>> >>
>> >> - We should replace the "document" term with something more speaking.
>> >> It
>> >> doesn't tell that it's some kind of input. Also, most of these inputs
>> >> aren't something that people typically call documents. Like a csv
>> >> file, or
>> >> a database table, which is not even a file (OK we don't support such
>> >> thing
>> >> at the moment). I think, maybe "data source" is a safe enough term.
>> >> (It
>> >> also rhymes with data model.)
>> >> - You have separate "template" and "document" "mode", that applies to
>> >> a
>> >> whole run. I think such specialization won't be helpful. We could
>> >> just say,
>> >> on the conceptual level at lest, that we need a set of "outputs
>> >> generators". An output generator is an object (in the API) that
>> >> specifies a
>> >> template, a data-model (where the data-model is possibly populated
>> >> with
>> >> "documents"), and an output "sink" (a file path, or stdout), and can
>> >> generate the output itself. A practical way of defining the output
>> >> generators in a CLI application is via a bunch of files, each
>> >> defining an
>> >> output generator. Some of those files is maybe a template (that you
>> >> can
>> >> even detect from the file extension), or a data file that we
>> >> currently call
>> >> a "document". They could freely mix inside the same run. I have also
>> >> met
>> >> use case when you have a single table (single "document"), and each
>> >> record
>> >> in it yields an output file. That can also be described in some file
>> >> format, or really in any other way, like directly in command line
>> >> argument,
>> >> via API, etc.
>> >> - You have multiple documents without associated symbolical name in
>> >> some
>> >> examples. Templates can't identify those then in a well maintainable
>> >> way.
>> >> The actual file name is often not a good identifier, can change over
>> >> time,
>> >> and you might don't even have good control over it, like you already
>> >> receive it as a parameter from somewhere else, or someone
>> >> moves/renames
>> >> that files that you need to read. Index is also not very good, but I
>> >> have
>> >> written about that earlier.
>> >>
>> >>
>> >> On Wed, Feb 26, 2020 at 9:33 AM Siegfried Goeschl <
>> >> siegfried.goeschl@gmail.com> wrote:
>> >>
>> >> Hi folks,
>> >>
>> >> still wrapping my side around but assembled some thoughts here -
>> >> https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449
>> >>
>> >> Thanks in advance,
>> >>
>> >> Siegfried Goeschl
>> >>
>> >>
>> >> On 23 Feb 2020, at 23:14, Daniel Dekany <dd...@apache.org> wrote:
>> >>
>> >> What you are describing is more like the angle that FMPP took
>> >> initially,
>> >> where templates drive things, they generate the output for themselves
>> >>
>> >> (even
>> >>
>> >> multiple output files if they wish). By default output files name
>> >> (and
>> >> relative path) is deduced from template name. There was also a global
>> >> data-model, built in a configuration file (or equally, built via
>> >> command
>> >> line arguments, or both mixed), from which templates get whatever
>> >> data
>> >>
>> >> they
>> >>
>> >> are interested in. Take a look at the figures here:
>> >> http://fmpp.sourceforge.net/qtour.html. Later, this concept was
>> >>
>> >> generalized
>> >>
>> >> a bit more, because you could add XML files at the same place where
>> >> you
>> >> have the templates, and then you could associate transform templates
>> >> to
>> >>
>> >> the
>> >>
>> >> XML files (based on path pattern and/or the XML document element).
>> >> Now
>> >> that's like what freemarker-generator had initially (data files drive
>> >> output, and the template is there to transform it).
>> >>
>> >> So I think the generic mental model would like this:
>> >>
>> >> 1. You got files that drive the process, let's call them *generator
>> >> files* for now. Usually, each generator file yields an output file
>> >> (but
>> >> maybe even multiple output files, as you might saw in the last
>> >> figure).
>> >> These generator files can be of many types, like XML, JSON, XLSX (as
>> >>
>> >> in the
>> >>
>> >> original freemarker-generator), and even templates (as is the norm in
>> >> FMPP). If the file is not a template, then you got a set of
>> >> transformer
>> >> templates (-t CLI option) in a separate directory, which can be
>> >>
>> >> associated
>> >>
>> >> with the generator files base on name patterns, and even based on
>> >>
>> >> content
>> >>
>> >> (schema usually). If the generator file is a template (so that's a
>> >> positional @Parameter CLI argument that happens to be an *.ftl, and
>> >> is
>> >>
>> >> not
>> >>
>> >> a template file specified after the "-t" option), then you just
>> >> Template.process(...) it, and it prints what the output will be.
>> >> 2. You also have a set of variables, the global data-model, that
>> >> contains commonly useful stuff, like what you now call parameters
>> >> (CLI
>> >> -Pname=value), but also maybe data loaded from JSON, XML, etc.. Those
>> >>
>> >> data
>> >>
>> >> files aren't "generator files". Templates just use them if they need
>> >>
>> >> them.
>> >>
>> >> An important thing here is to reuse the same mechanism to read and
>> >>
>> >> parse
>> >>
>> >> those data files, which was used in templates when transforming
>> >>
>> >> generator
>> >>
>> >> files. So we need a common format for specifying how to load data
>> >>
>> >> files.
>> >>
>> >> That's maybe just FTL that #assigns to the variables, or maybe more
>> >> declarative format.
>> >>
>> >> What I have described in the original post here was a less generic
>> >> form
>> >>
>> >> of
>> >>
>> >> this, as I tried to be true with the original approach. I though the
>> >> proposal will be drastic enough as it is... :) There, the "main"
>> >> document
>> >> is the "generator file" from point 1, the "-t" template is the
>> >> transform
>> >> template for the "main" document, and the other named documents
>> >> ("users",
>> >> "groups") is a poor man's shared data-model from point 2 (together
>> >> with
>> >> with -PName=value).
>> >>
>> >> There's further somewhat confusing thing to get right with the
>> >> list-of-documents (`DocuentList`, `NamedDocumentLists`) thing though.
>> >> In
>> >> the model above, as per point 1, if you list multiple data files,
>> >> each
>> >>
>> >> will
>> >>
>> >> generate a separate output file. So, if you need take in a list of
>> >> files
>> >>
>> >> to
>> >>
>> >> transform it to a single output file (or at least with a single
>> >> transform
>> >> template execution), then you have to be explicit about that, as
>> >> that's
>> >>
>> >> not
>> >>
>> >> the default behavior anymore. But it's still absolutely possible.
>> >> Imagine
>> >> it as a "list of XLSX-es" is itself like a file format. You need some
>> >> CLI
>> >> (and Maven config, etc.) syntax to express that, but that shouldn't
>> >> be a
>> >> big deal.
>> >>
>> >>
>> >>
>> >> On Sun, Feb 23, 2020 at 9:43 PM Siegfried Goeschl <
>> >> siegfried.goeschl@gmail.com> wrote:
>> >>
>> >> Hi Daniel,
>> >>
>> >> Good timing - I was looking at a similar problem from different angle
>> >> yesterday (see below)
>> >>
>> >> Don't have enough time to answer your email in detail now - will do
>> >> that
>> >> tomorrow evening
>> >>
>> >> Thanks in advance,
>> >>
>> >> Siegfried Goeschl
>> >>
>> >>
>> >> ===. START
>> >> # FreeMarker CLI Improvement
>> >> ## Support Of Multiple Template Files
>> >> Currently we support the following combinations
>> >>
>> >> * Single template and no data files
>> >> * Single template and one or more data files
>> >>
>> >> But we can not support the following use case which is quite typical
>> >> in
>> >> the cloud
>> >>
>> >> __Convert multiple templates with a single data file, e.g copying a
>> >> directory of configuration files using a JSON configuration file__
>> >>
>> >> ## Implementation notes
>> >> * When we copy a directory we can remove the `ftl`extension on the
>> >> fly
>> >> * We might need an `exclude` filter for the copy operation
>> >> * Initially resolve to a list of template files and process one after
>> >> another
>> >> * Need to calculate the output file location and extension
>> >> * We need to rename the existing command line parameters (see below)
>> >> * Do we need multiple include and exclude filter?
>> >> * Do we need file versus directory filters?
>> >>
>> >> ### Command Line Options
>> >> ```
>> >> --input-encoding : Encoding of the documents
>> >> --output-encoding : Encoding of the rendered template
>> >> --template-encoding : Encoding of the template
>> >> --output : Output file or directory
>> >> --include-document : Include pattern for documents
>> >> --exclude-document : Exclude pattern for documents
>> >> --include-template: Include pattern for templates
>> >> --exclude-template : Exclude pattern for templates
>> >> ```
>> >>
>> >> ### Command Line Examples
>> >> ```text
>> >> # Copy all FTL templates found in "ext/config" to the "/config"
>> >>
>> >> directory
>> >>
>> >> using the data from "config.json"
>> >>
>> >> freemarker-cli -t ./ext/config --include-template *.ftl --o /config
>> >>
>> >> config.json
>> >>
>> >> freemarker-cli --template ./ext/config --include-template *.ftl
>> >>
>> >> --output
>> >>
>> >> /config config.json
>> >>
>> >> # Bascically the same using a named document "configuration"
>> >> # It might make sense to expose "conf" directly in the FreeMarker
>> >> data
>> >> model
>> >> # It might make sens to allow URIs for loading documents
>> >>
>> >> freemarker-cli -t ./ext/config/*.ftl -o /config -d
>> >>
>> >> configuration=config.json
>> >>
>> >> freemarker-cli --template ./ext/config --include-template *.ftl
>> >>
>> >> --output
>> >>
>> >> /config --document configuration=config.json
>> >>
>> >> freemarker-cli --template ./ext/config --include-template *.ftl
>> >>
>> >> --output
>> >>
>> >> /config --document configuration=file:///config.json
>> >>
>> >> # Bascically the same using an environment variable as named document
>> >>
>> >> freemarker-cli -t ./ext/config --include-template *.ftl -o /config -d
>> >>
>> >> configuration=env:///CONFIGURATION
>> >>
>> >> freemarker-cli --template ./ext/config --include-template *.ftl
>> >>
>> >> --output
>> >>
>> >> /config --document configuration=env:///CONFIGURATION
>> >> ```
>> >> === END
>> >>
>> >> On 23.02.2020, at 16:37, Daniel Dekany <dd...@apache.org> wrote:
>> >>
>> >> Input documents is a fundamental concept in freemarker-generator, so
>> >> we
>> >> should think about that more, and probably refine/rework how it's
>> >> done.
>> >>
>> >> Currently it works like this, with CLI at least.
>> >>
>> >> freemarker-cli
>> >> -t access-report.ftl
>> >> somewhere/foo-access-log.csv
>> >>
>> >> Then in access-report.ftl you have to do something like this:
>> >>
>> >> <#assign doc = Documents.get(0)>
>> >> ... process doc here
>> >>
>> >> (The more idiomatic Documents[0] won't work. Actually, that lead to a
>> >>
>> >> funny
>> >>
>> >> chain of coincidences: It returned the string "D", then
>> >>
>> >> CSVTool.parse(...)
>> >>
>> >> happily parsed that to a table with the single column "D", and 0
>> >> rows,
>> >>
>> >> and
>> >>
>> >> as there were 0 rows, the template didn't run into an error because
>> >> row.myExpectedColumn refers to a missing column either, so the
>> >> process
>> >> finished with success. (: Pretty unlucky for sure. The root was
>> >> unintentionally breaking a FreeMarker idiom though; eventually we
>> >> will
>> >>
>> >> have
>> >>
>> >> to work on those too, but, different topic.)
>> >>
>> >> However, actually multiple input documents can be passed in:
>> >>
>> >> freemarker-cli
>> >> -t access-report.ftl
>> >> somewhere/foo-access-log.csv
>> >> somewhere/bar-access-log.csv
>> >>
>> >> Above template will still work, though then you ignored all but the
>> >>
>> >> first
>> >>
>> >> document. So if you expect any number of input documents, you
>> >> probably
>> >>
>> >> will
>> >>
>> >> have to do this:
>> >>
>> >> <#list Documents.list as doc>
>> >> ... process doc here
>> >> </#list>
>> >>
>> >> (The more idiomatic <#list Documents as doc> won't work; but again,
>> >>
>> >> those
>> >>
>> >> we will work out in a different thread.)
>> >>
>> >>
>> >> So, what would be better, in my opinion. I start out from what I
>> >> think
>> >>
>> >> are
>> >>
>> >> the common uses cases, in decreasing order of frequency. Goal is to
>> >>
>> >> make
>> >>
>> >> those less error prone for the users, and simpler to express.
>> >>
>> >> USE CASE 1
>> >>
>> >> You have exactly 1 input documents, which is therefore simply "the"
>> >> document in the mind of the user. This is probably the typical use
>> >>
>> >> case,
>> >>
>> >> but at least the use case users typically start out from when
>> >> starting
>> >>
>> >> the
>> >>
>> >> work.
>> >>
>> >> freemarker-cli
>> >> -t access-report.ftl
>> >> somewhere/foo-access-log.csv
>> >>
>> >> Then `Documents.get(0)` is not very fitting. Most importantly it's
>> >>
>> >> error
>> >>
>> >> prone, because if the user passed in more than 1 documents (can even
>> >>
>> >> happen
>> >>
>> >> totally accidentally, like if the user was lazy and used a wildcard
>> >>
>> >> that
>> >>
>> >> the shell exploded), the template will silently ignore the rest of
>> >> the
>> >> documents, and the singe document processed will be practically
>> >> picked
>> >> randomly. The user might won't notice that and submits a bad report
>> >> or
>> >>
>> >> such.
>> >>
>> >> I think that in this use case the document should be simply referred
>> >> as
>> >> `Document` in the template. When you have multiple documents there,
>> >> referring to `Document` should be an error, saying that the template
>> >>
>> >> was
>> >>
>> >> made to process a single document only.
>> >>
>> >>
>> >> USE CASE 2
>> >>
>> >> You have multiple input documents, but each has different role
>> >>
>> >> (different
>> >>
>> >> schema, maybe different file type). Like, you pass in users.csv and
>> >> groups.csv. Each has difference schema, and so you want to access
>> >> them
>> >> differently, but in the same template.
>> >>
>> >> freemarker-cli
>> >> [...]
>> >> --named-document users somewhere/foo-users.csv
>> >> --named-document groups somewhere/foo-groups.csv
>> >>
>> >> Then in the template you could refer to them as:
>> >>
>> >> `NamedDocuments.users`,
>> >>
>> >> and `NamedDocuments.groups`.
>> >>
>> >> Use Case 1, and 2 can be unified into a coherent concept, where
>> >>
>> >> `Document`
>> >>
>> >> is just a shorthand for `NamedDocuments.main`. It's called "main"
>> >>
>> >> because
>> >>
>> >> that's "the" document the template is about, but then you have to
>> >> added
>> >> some helper documents, with symbolic names representing their role.
>> >>
>> >> freemarker-cli
>> >> -t access-report.ftl
>> >> --document-name=main somewhere/foo-access-log.csv
>> >> --document-name=users somewhere/foo-users.csv
>> >> --document-name=groups somewhere/foo-groups.csv
>> >>
>> >> Here, `Document` still works in the template, and it refers to
>> >> `somewhere/foo-access-log.csv`. (While omitting --document-name=main
>> >>
>> >> above
>> >>
>> >> would be cleaner, I couldn't figure out how to do that with Picocli.
>> >> Anyway, for now the point is the concept, which is not specific to
>> >>
>> >> CLI.)
>> >>
>> >> USE CASE 3
>> >>
>> >> Here you have several of the same kind of documents. That has a more
>> >> generic sub-use-case, when you have explicitly named documents (like
>> >> "users" above), and for some you expect multiple input files.
>> >>
>> >> freemarker-cli
>> >> -t access-report.ftl
>> >> --document-name=main somewhere/foo-access-log.csv
>> >> somewhere/bar-access-log.csv
>> >> --document-name=users somewhere/foo-users.csv
>> >> somewhere/bar-users.csv
>> >> --document-name=groups somewhere/global-groups.csv
>> >>
>> >> The template must to be written with this use case in mind, as now it
>> >>
>> >> has
>> >>
>> >> #list some of the documents. (I think in practice you hardly ever
>> >> want
>> >>
>> >> to
>> >>
>> >> get a document by hard coded index. Either you don't know how many
>> >> documents you have, so you can't use hard coded indexes, or you do,
>> >> and
>> >> each index has a specific meaning, but then you should name the
>> >>
>> >> documents
>> >>
>> >> instead, as using indexes is error prone, and hard to read.)
>> >> Accessing that list of documents in the template, maybe could be done
>> >>
>> >> like
>> >>
>> >> this:
>> >> - For the "main" documents: `DocumentList`
>> >> - For explicitly named documents, like "users":
>> >>
>> >> `NamedDocumentLists.users`
>> >>
>> >> SUMMING UP
>> >>
>> >> To unify all 3 use cases into a coherent concept:
>> >> - `NamedDocumentLists.<name>` is the most generic form, and while you
>> >>
>> >> can
>> >>
>> >> achieve everything with it, using it requires your template to handle
>> >>
>> >> the
>> >>
>> >> most generic case too. So, I think it would be rarely used.
>> >> - `DocumentList` is just a shorthand for `NamedDocumentLists.main`.
>> >>
>> >> It's
>> >>
>> >> used if you only have one kind of documents (single format and
>> >> schema),
>> >>
>> >> but
>> >>
>> >> potentially multiple of them.
>> >> - `NamedDocuments.<name>` expresses that you expect exactly 1
>> >> document
>> >>
>> >> of
>> >>
>> >> the given name.
>> >> - `Document` is just a shorthand for `NamedDocuments.main`. This is
>> >> for
>> >>
>> >> the
>> >>
>> >> most natural/frequent use case.
>> >>
>> >> That's 4 possible ways of accessing your documents, which is a
>> >>
>> >> trade-off
>> >>
>> >> for the sake of these:
>> >> - Catching CLI (or Maven, etc.) input where the template output
>> >> likely
>> >>
>> >> will
>> >>
>> >> be wrong. That's only possible if the user can communicate its intent
>> >>
>> >> in
>> >>
>> >> the template.
>> >> - Users don't need to deal with concepts that are irrelevant in their
>> >> concrete use case. Just start with the trivial, `Document`, and later
>> >>
>> >> if
>> >>
>> >> the need arises, generalize to named documents, document lists, or
>> >>
>> >> both.
>> >>
>> >> What do guys think?
>> >>
>> >>
>>
>
>
> --
> Best regards,
> Daniel Dekany
>


-- 
Best regards,
Daniel Dekany

Re: freemarker-generator: Improving the input documents concept

Posted by Daniel Dekany <da...@gmail.com>.

But, I do recognize the cattle use case (several "faceless" files with
common format/schema). Only, my idea is to push that complexity on the data
source. The "data source" concept shields the rest of the application from
the details of how the data is stored or retrieved. So, a data source might
loads a bunch of log files from a directory, and present them as a single
big table, or like a list of tables, etc. So I want to deal with the cattle
use case, but the question is what part of the of architecture will deal
with this complication, with other words, how do you box things. Why my
initial bet is to stuff that complication into the "data source"
implementation(s) is that data sources are inherently varied. Some returns
a table-like thing, some have multiple named tables (worksheets in Excel),
some returns tree of nodes (XML), etc. So then, some might returns a
list-of-list-of log records, or just a single list of log-records (put
together from daily log files). That way cattles don't add to conceptual
complexity. Now, you might be aware of cases where the cattle concept must
be more exposed than this, and the we can't box things like this. But this
is what I tried to express.

Regarding "output generators", and how that applies on the command line. I
think it's important that the common core between Maven and command-line is
as fat as possible. Ideally, they are just two syntax to set up the same
thing. Mostly at least. So, if you specify a template file to the CLI
application, in a way so that it causes it to process that template to
generate a single output, then there you have just defined an "output
generator" (even if it wasn't explicitly called like that in the command
line). If you specify 3 csv files to the CLI application, in a way so that
it causes it to generate 3 output files, then you have just defined 3
"output generators" there (there's at least one template specified there
too, but that wasn't an "output generator" itself, it was just an attribute
of the 3 output generators). If you specify 1 template, and 3 csv files, in
a way so that it will yield 4 output files (1 for the template, 3 for the
csv-s), then you have defined 4 output generators there. If you have a data
source that loads a list of 3 entities (say, 3 csv files, so it's a list of
tables then), and you have 2 templates, and you tell the CLI to execute
each template for each item in said data source, then you have just defined
6 "output generators".

On Fri, Feb 28, 2020 at 11:08 AM Siegfried Goeschl <
siegfried.goeschl@gmail.com> wrote:

> Hi Daniel,
>
> That all depends on your mental model and work you do, expectations,
> experience :-)
>
>
> __Document Handling__
>
> *"But I think actually we have no good use case for list of documents
> that's passed at once to a single template run, so, we can just ignore
> that complication"*
>
> In my case that's not a complication but my daily business - I'm
> regularly wading through access logs - yesterday probably a couple of
> hundreds access logs across two staging sites to help tracking some
> strange API gateway issues :-)
>
> My gut feeling is (borrowing from
>
> https://medium.com/@Joachim8675309/devops-concepts-pets-vs-cattle-2380b5aab313
> )
>
> 1. You have a few lovely named documents / templates - `pets`
> 2. You have tons of anonymous documents / templates to process -
> `cattle`
> 3. The "grey area" comes into play when mixing `pets & cattle`
>
> `freemarker-cli` was built with 2) in mind and I want to cover 1) since
> it is equally important and common.
>
>
> __Template And Document Processing Modes__
>
> IMHO it is important to answer the following question : "How many
> outputs do you get when rendering 2 template and 3 datasources? Two,
> Three or Six?"
>
> Your answer is influenced by your mental model / experience
>
> * When wading through tons of CSV files, access logs, etc. the answer is
> "2"
> * When doing source code generation the obvious answer is "6"
> * Can't image a use case which results in "3" but I'm pretty sure we
> will encounter one
>
> __Template and document mode probably shouldn't exist__
>
> That's hard for me to fully understand - I definitely lack your insights
> & experience writing such tools :-)
>
> Defining the `Output Generator` is the underlying model for the Maven
> plugin (and probably FMPP).
>
> I'm not sure if this applies for command lines at least not in the way I
> use them (or would like to use them)
>
>
> Thanks in advance,
>
> Siegfried Goeschl
>
> PS: Can/shall I merge the PR to bring in `freemarker-cli`?
>
>
> On 28 Feb 2020, at 9:14, Daniel Dekany wrote:
>
> > Yeah, "data source" is surely a too popular name, but for reason.
> > Anyone
> > has other ideas?
> >
> > As of naming data sources and such. One thing I was wondering about
> > back
> > then is how to deal with list of documents given to a template, versus
> > exactly 1 document given to a template. But I think actually we have
> > no
> > good use case for list of documents that's passed at once to a single
> > template run, so, we can just ignore that complication. A document has
> > a
> > name, and that's always just a single document, not a collection, as
> > far as
> > the template is concerned. (We can have multiple documents per run,
> > but
> > those normally yield separate output generators, so it's still only
> > one
> > document per template.) However, we can have data source types
> > (document
> > types with old terminology) that collect together multiple data files.
> > So
> > then that complexity is encapsulated into the data source type, and
> > doesn't
> > complicate the overall architecture. That's another case when a data
> > source
> > is not just a file. Like maybe there's a data source type that loads
> > all
> > the CSV-s from a directory, into a single big table (I had such case),
> > or
> > even into a list of tables. Or, as I mentioned already, a data source
> > is
> > maybe an SQL query on a JDBC data source (and we got the first term
> > clash... JDBC also call them data sources).
> >
> > Template and document mode probably shouldn't exist from user
> > perspective
> > either, at least not as a global option that must apply to everything
> > in a
> > run. They could just give the files that define the "output
> > generators",
> > and some of them will be templates, some of them are data files, in
> > which
> > case a template need to be associated with them (and there can be a
> > couple
> > of ways of doing that). And then again, there are the cases where you
> > want
> > to create one output generator per entity from some data source.
> >
> > On Fri, Feb 28, 2020 at 8:23 AM Siegfried Goeschl <
> > siegfried.goeschl@gmail.com> wrote:
> >
> >> Hi Daniel,
> >>
> >> See my comments below - and thanks for your patience and input :-)
> >>
> >> *Renaming Document To DataSource*
> >>
> >> Yes, makes sense. I tried to avoid since I'm using javax.activation
> >> and
> >> its DataSource.
> >>
> >> *Template And Document Mode*
> >>
> >> Agreed - I think it is a valuable abstraction for the user but it is
> >> not
> >> an implementation concept :-)
> >>
> >> *Document Without Symbolic Names*
> >>
> >> Also agreed and it is going to change but I have not settled my mind
> >> yet
> >> what exactly to implement.
> >>
> >> Thanks in advance,
> >>
> >> Siegfried Goeschl
> >>
> >> On 28 Feb 2020, at 1:05, Daniel Dekany wrote:
> >>
> >> A few quick thoughts on that:
> >>
> >> - We should replace the "document" term with something more speaking.
> >> It
> >> doesn't tell that it's some kind of input. Also, most of these inputs
> >> aren't something that people typically call documents. Like a csv
> >> file, or
> >> a database table, which is not even a file (OK we don't support such
> >> thing
> >> at the moment). I think, maybe "data source" is a safe enough term.
> >> (It
> >> also rhymes with data model.)
> >> - You have separate "template" and "document" "mode", that applies to
> >> a
> >> whole run. I think such specialization won't be helpful. We could
> >> just say,
> >> on the conceptual level at lest, that we need a set of "outputs
> >> generators". An output generator is an object (in the API) that
> >> specifies a
> >> template, a data-model (where the data-model is possibly populated
> >> with
> >> "documents"), and an output "sink" (a file path, or stdout), and can
> >> generate the output itself. A practical way of defining the output
> >> generators in a CLI application is via a bunch of files, each
> >> defining an
> >> output generator. Some of those files is maybe a template (that you
> >> can
> >> even detect from the file extension), or a data file that we
> >> currently call
> >> a "document". They could freely mix inside the same run. I have also
> >> met
> >> use case when you have a single table (single "document"), and each
> >> record
> >> in it yields an output file. That can also be described in some file
> >> format, or really in any other way, like directly in command line
> >> argument,
> >> via API, etc.
> >> - You have multiple documents without associated symbolical name in
> >> some
> >> examples. Templates can't identify those then in a well maintainable
> >> way.
> >> The actual file name is often not a good identifier, can change over
> >> time,
> >> and you might don't even have good control over it, like you already
> >> receive it as a parameter from somewhere else, or someone
> >> moves/renames
> >> that files that you need to read. Index is also not very good, but I
> >> have
> >> written about that earlier.
> >>
> >>
> >> On Wed, Feb 26, 2020 at 9:33 AM Siegfried Goeschl <
> >> siegfried.goeschl@gmail.com> wrote:
> >>
> >> Hi folks,
> >>
> >> still wrapping my side around but assembled some thoughts here -
> >> https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449
> >>
> >> Thanks in advance,
> >>
> >> Siegfried Goeschl
> >>
> >>
> >> On 23 Feb 2020, at 23:14, Daniel Dekany <dd...@apache.org> wrote:
> >>
> >> What you are describing is more like the angle that FMPP took
> >> initially,
> >> where templates drive things, they generate the output for themselves
> >>
> >> (even
> >>
> >> multiple output files if they wish). By default output files name
> >> (and
> >> relative path) is deduced from template name. There was also a global
> >> data-model, built in a configuration file (or equally, built via
> >> command
> >> line arguments, or both mixed), from which templates get whatever
> >> data
> >>
> >> they
> >>
> >> are interested in. Take a look at the figures here:
> >> http://fmpp.sourceforge.net/qtour.html. Later, this concept was
> >>
> >> generalized
> >>
> >> a bit more, because you could add XML files at the same place where
> >> you
> >> have the templates, and then you could associate transform templates
> >> to
> >>
> >> the
> >>
> >> XML files (based on path pattern and/or the XML document element).
> >> Now
> >> that's like what freemarker-generator had initially (data files drive
> >> output, and the template is there to transform it).
> >>
> >> So I think the generic mental model would like this:
> >>
> >> 1. You got files that drive the process, let's call them *generator
> >> files* for now. Usually, each generator file yields an output file
> >> (but
> >> maybe even multiple output files, as you might saw in the last
> >> figure).
> >> These generator files can be of many types, like XML, JSON, XLSX (as
> >>
> >> in the
> >>
> >> original freemarker-generator), and even templates (as is the norm in
> >> FMPP). If the file is not a template, then you got a set of
> >> transformer
> >> templates (-t CLI option) in a separate directory, which can be
> >>
> >> associated
> >>
> >> with the generator files base on name patterns, and even based on
> >>
> >> content
> >>
> >> (schema usually). If the generator file is a template (so that's a
> >> positional @Parameter CLI argument that happens to be an *.ftl, and
> >> is
> >>
> >> not
> >>
> >> a template file specified after the "-t" option), then you just
> >> Template.process(...) it, and it prints what the output will be.
> >> 2. You also have a set of variables, the global data-model, that
> >> contains commonly useful stuff, like what you now call parameters
> >> (CLI
> >> -Pname=value), but also maybe data loaded from JSON, XML, etc.. Those
> >>
> >> data
> >>
> >> files aren't "generator files". Templates just use them if they need
> >>
> >> them.
> >>
> >> An important thing here is to reuse the same mechanism to read and
> >>
> >> parse
> >>
> >> those data files, which was used in templates when transforming
> >>
> >> generator
> >>
> >> files. So we need a common format for specifying how to load data
> >>
> >> files.
> >>
> >> That's maybe just FTL that #assigns to the variables, or maybe more
> >> declarative format.
> >>
> >> What I have described in the original post here was a less generic
> >> form
> >>
> >> of
> >>
> >> this, as I tried to be true with the original approach. I though the
> >> proposal will be drastic enough as it is... :) There, the "main"
> >> document
> >> is the "generator file" from point 1, the "-t" template is the
> >> transform
> >> template for the "main" document, and the other named documents
> >> ("users",
> >> "groups") is a poor man's shared data-model from point 2 (together
> >> with
> >> with -PName=value).
> >>
> >> There's further somewhat confusing thing to get right with the
> >> list-of-documents (`DocuentList`, `NamedDocumentLists`) thing though.
> >> In
> >> the model above, as per point 1, if you list multiple data files,
> >> each
> >>
> >> will
> >>
> >> generate a separate output file. So, if you need take in a list of
> >> files
> >>
> >> to
> >>
> >> transform it to a single output file (or at least with a single
> >> transform
> >> template execution), then you have to be explicit about that, as
> >> that's
> >>
> >> not
> >>
> >> the default behavior anymore. But it's still absolutely possible.
> >> Imagine
> >> it as a "list of XLSX-es" is itself like a file format. You need some
> >> CLI
> >> (and Maven config, etc.) syntax to express that, but that shouldn't
> >> be a
> >> big deal.
> >>
> >>
> >>
> >> On Sun, Feb 23, 2020 at 9:43 PM Siegfried Goeschl <
> >> siegfried.goeschl@gmail.com> wrote:
> >>
> >> Hi Daniel,
> >>
> >> Good timing - I was looking at a similar problem from different angle
> >> yesterday (see below)
> >>
> >> Don't have enough time to answer your email in detail now - will do
> >> that
> >> tomorrow evening
> >>
> >> Thanks in advance,
> >>
> >> Siegfried Goeschl
> >>
> >>
> >> ===. START
> >> # FreeMarker CLI Improvement
> >> ## Support Of Multiple Template Files
> >> Currently we support the following combinations
> >>
> >> * Single template and no data files
> >> * Single template and one or more data files
> >>
> >> But we can not support the following use case which is quite typical
> >> in
> >> the cloud
> >>
> >> __Convert multiple templates with a single data file, e.g copying a
> >> directory of configuration files using a JSON configuration file__
> >>
> >> ## Implementation notes
> >> * When we copy a directory we can remove the `ftl`extension on the
> >> fly
> >> * We might need an `exclude` filter for the copy operation
> >> * Initially resolve to a list of template files and process one after
> >> another
> >> * Need to calculate the output file location and extension
> >> * We need to rename the existing command line parameters (see below)
> >> * Do we need multiple include and exclude filter?
> >> * Do we need file versus directory filters?
> >>
> >> ### Command Line Options
> >> ```
> >> --input-encoding : Encoding of the documents
> >> --output-encoding : Encoding of the rendered template
> >> --template-encoding : Encoding of the template
> >> --output : Output file or directory
> >> --include-document : Include pattern for documents
> >> --exclude-document : Exclude pattern for documents
> >> --include-template: Include pattern for templates
> >> --exclude-template : Exclude pattern for templates
> >> ```
> >>
> >> ### Command Line Examples
> >> ```text
> >> # Copy all FTL templates found in "ext/config" to the "/config"
> >>
> >> directory
> >>
> >> using the data from "config.json"
> >>
> >> freemarker-cli -t ./ext/config --include-template *.ftl --o /config
> >>
> >> config.json
> >>
> >> freemarker-cli --template ./ext/config --include-template *.ftl
> >>
> >> --output
> >>
> >> /config config.json
> >>
> >> # Bascically the same using a named document "configuration"
> >> # It might make sense to expose "conf" directly in the FreeMarker
> >> data
> >> model
> >> # It might make sens to allow URIs for loading documents
> >>
> >> freemarker-cli -t ./ext/config/*.ftl -o /config -d
> >>
> >> configuration=config.json
> >>
> >> freemarker-cli --template ./ext/config --include-template *.ftl
> >>
> >> --output
> >>
> >> /config --document configuration=config.json
> >>
> >> freemarker-cli --template ./ext/config --include-template *.ftl
> >>
> >> --output
> >>
> >> /config --document configuration=file:///config.json
> >>
> >> # Bascically the same using an environment variable as named document
> >>
> >> freemarker-cli -t ./ext/config --include-template *.ftl -o /config -d
> >>
> >> configuration=env:///CONFIGURATION
> >>
> >> freemarker-cli --template ./ext/config --include-template *.ftl
> >>
> >> --output
> >>
> >> /config --document configuration=env:///CONFIGURATION
> >> ```
> >> === END
> >>
> >> On 23.02.2020, at 16:37, Daniel Dekany <dd...@apache.org> wrote:
> >>
> >> Input documents is a fundamental concept in freemarker-generator, so
> >> we
> >> should think about that more, and probably refine/rework how it's
> >> done.
> >>
> >> Currently it works like this, with CLI at least.
> >>
> >> freemarker-cli
> >> -t access-report.ftl
> >> somewhere/foo-access-log.csv
> >>
> >> Then in access-report.ftl you have to do something like this:
> >>
> >> <#assign doc = Documents.get(0)>
> >> ... process doc here
> >>
> >> (The more idiomatic Documents[0] won't work. Actually, that lead to a
> >>
> >> funny
> >>
> >> chain of coincidences: It returned the string "D", then
> >>
> >> CSVTool.parse(...)
> >>
> >> happily parsed that to a table with the single column "D", and 0
> >> rows,
> >>
> >> and
> >>
> >> as there were 0 rows, the template didn't run into an error because
> >> row.myExpectedColumn refers to a missing column either, so the
> >> process
> >> finished with success. (: Pretty unlucky for sure. The root was
> >> unintentionally breaking a FreeMarker idiom though; eventually we
> >> will
> >>
> >> have
> >>
> >> to work on those too, but, different topic.)
> >>
> >> However, actually multiple input documents can be passed in:
> >>
> >> freemarker-cli
> >> -t access-report.ftl
> >> somewhere/foo-access-log.csv
> >> somewhere/bar-access-log.csv
> >>
> >> Above template will still work, though then you ignored all but the
> >>
> >> first
> >>
> >> document. So if you expect any number of input documents, you
> >> probably
> >>
> >> will
> >>
> >> have to do this:
> >>
> >> <#list Documents.list as doc>
> >> ... process doc here
> >> </#list>
> >>
> >> (The more idiomatic <#list Documents as doc> won't work; but again,
> >>
> >> those
> >>
> >> we will work out in a different thread.)
> >>
> >>
> >> So, what would be better, in my opinion. I start out from what I
> >> think
> >>
> >> are
> >>
> >> the common uses cases, in decreasing order of frequency. Goal is to
> >>
> >> make
> >>
> >> those less error prone for the users, and simpler to express.
> >>
> >> USE CASE 1
> >>
> >> You have exactly 1 input documents, which is therefore simply "the"
> >> document in the mind of the user. This is probably the typical use
> >>
> >> case,
> >>
> >> but at least the use case users typically start out from when
> >> starting
> >>
> >> the
> >>
> >> work.
> >>
> >> freemarker-cli
> >> -t access-report.ftl
> >> somewhere/foo-access-log.csv
> >>
> >> Then `Documents.get(0)` is not very fitting. Most importantly it's
> >>
> >> error
> >>
> >> prone, because if the user passed in more than 1 documents (can even
> >>
> >> happen
> >>
> >> totally accidentally, like if the user was lazy and used a wildcard
> >>
> >> that
> >>
> >> the shell exploded), the template will silently ignore the rest of
> >> the
> >> documents, and the singe document processed will be practically
> >> picked
> >> randomly. The user might won't notice that and submits a bad report
> >> or
> >>
> >> such.
> >>
> >> I think that in this use case the document should be simply referred
> >> as
> >> `Document` in the template. When you have multiple documents there,
> >> referring to `Document` should be an error, saying that the template
> >>
> >> was
> >>
> >> made to process a single document only.
> >>
> >>
> >> USE CASE 2
> >>
> >> You have multiple input documents, but each has different role
> >>
> >> (different
> >>
> >> schema, maybe different file type). Like, you pass in users.csv and
> >> groups.csv. Each has difference schema, and so you want to access
> >> them
> >> differently, but in the same template.
> >>
> >> freemarker-cli
> >> [...]
> >> --named-document users somewhere/foo-users.csv
> >> --named-document groups somewhere/foo-groups.csv
> >>
> >> Then in the template you could refer to them as:
> >>
> >> `NamedDocuments.users`,
> >>
> >> and `NamedDocuments.groups`.
> >>
> >> Use Case 1, and 2 can be unified into a coherent concept, where
> >>
> >> `Document`
> >>
> >> is just a shorthand for `NamedDocuments.main`. It's called "main"
> >>
> >> because
> >>
> >> that's "the" document the template is about, but then you have to
> >> added
> >> some helper documents, with symbolic names representing their role.
> >>
> >> freemarker-cli
> >> -t access-report.ftl
> >> --document-name=main somewhere/foo-access-log.csv
> >> --document-name=users somewhere/foo-users.csv
> >> --document-name=groups somewhere/foo-groups.csv
> >>
> >> Here, `Document` still works in the template, and it refers to
> >> `somewhere/foo-access-log.csv`. (While omitting --document-name=main
> >>
> >> above
> >>
> >> would be cleaner, I couldn't figure out how to do that with Picocli.
> >> Anyway, for now the point is the concept, which is not specific to
> >>
> >> CLI.)
> >>
> >> USE CASE 3
> >>
> >> Here you have several of the same kind of documents. That has a more
> >> generic sub-use-case, when you have explicitly named documents (like
> >> "users" above), and for some you expect multiple input files.
> >>
> >> freemarker-cli
> >> -t access-report.ftl
> >> --document-name=main somewhere/foo-access-log.csv
> >> somewhere/bar-access-log.csv
> >> --document-name=users somewhere/foo-users.csv
> >> somewhere/bar-users.csv
> >> --document-name=groups somewhere/global-groups.csv
> >>
> >> The template must to be written with this use case in mind, as now it
> >>
> >> has
> >>
> >> #list some of the documents. (I think in practice you hardly ever
> >> want
> >>
> >> to
> >>
> >> get a document by hard coded index. Either you don't know how many
> >> documents you have, so you can't use hard coded indexes, or you do,
> >> and
> >> each index has a specific meaning, but then you should name the
> >>
> >> documents
> >>
> >> instead, as using indexes is error prone, and hard to read.)
> >> Accessing that list of documents in the template, maybe could be done
> >>
> >> like
> >>
> >> this:
> >> - For the "main" documents: `DocumentList`
> >> - For explicitly named documents, like "users":
> >>
> >> `NamedDocumentLists.users`
> >>
> >> SUMMING UP
> >>
> >> To unify all 3 use cases into a coherent concept:
> >> - `NamedDocumentLists.<name>` is the most generic form, and while you
> >>
> >> can
> >>
> >> achieve everything with it, using it requires your template to handle
> >>
> >> the
> >>
> >> most generic case too. So, I think it would be rarely used.
> >> - `DocumentList` is just a shorthand for `NamedDocumentLists.main`.
> >>
> >> It's
> >>
> >> used if you only have one kind of documents (single format and
> >> schema),
> >>
> >> but
> >>
> >> potentially multiple of them.
> >> - `NamedDocuments.<name>` expresses that you expect exactly 1
> >> document
> >>
> >> of
> >>
> >> the given name.
> >> - `Document` is just a shorthand for `NamedDocuments.main`. This is
> >> for
> >>
> >> the
> >>
> >> most natural/frequent use case.
> >>
> >> That's 4 possible ways of accessing your documents, which is a
> >>
> >> trade-off
> >>
> >> for the sake of these:
> >> - Catching CLI (or Maven, etc.) input where the template output
> >> likely
> >>
> >> will
> >>
> >> be wrong. That's only possible if the user can communicate its intent
> >>
> >> in
> >>
> >> the template.
> >> - Users don't need to deal with concepts that are irrelevant in their
> >> concrete use case. Just start with the trivial, `Document`, and later
> >>
> >> if
> >>
> >> the need arises, generalize to named documents, document lists, or
> >>
> >> both.
> >>
> >> What do guys think?
> >>
> >>
>


-- 
Best regards,
Daniel Dekany

Re: freemarker-generator: Improving the input documents concept

Posted by Siegfried Goeschl <si...@gmail.com>.

Hi Daniel,

That all depends on your mental model and work you do, expectations, 
experience :-)


__Document Handling__

*"But I think actually we have no good use case for list of documents 
that's passed at once to a single template run, so, we can just ignore 
that complication"*

In my case that's not a complication but my daily business - I'm 
regularly wading through access logs - yesterday probably a couple of 
hundreds access logs across two staging sites to help tracking some 
strange API gateway issues :-)

My gut feeling is (borrowing from 
https://medium.com/@Joachim8675309/devops-concepts-pets-vs-cattle-2380b5aab313)

1. You have a few lovely named documents / templates - `pets`
2. You have tons of anonymous documents / templates to process - 
`cattle`
3. The "grey area" comes into play when mixing `pets & cattle`

`freemarker-cli` was built with 2) in mind and I want to cover 1) since 
it is equally important and common.


__Template And Document Processing Modes__

IMHO it is important to answer the following question : "How many 
outputs do you get when rendering 2 template and 3 datasources? Two, 
Three or Six?"

Your answer is influenced by your mental model / experience

* When wading through tons of CSV files, access logs, etc. the answer is 
"2"
* When doing source code generation the obvious answer is "6"
* Can't image a use case which results in "3" but I'm pretty sure we 
will encounter one

__Template and document mode probably shouldn't exist__

That's hard for me to fully understand - I definitely lack your insights 
& experience writing such tools :-)

Defining the `Output Generator` is the underlying model for the Maven 
plugin (and probably FMPP).

I'm not sure if this applies for command lines at least not in the way I 
use them (or would like to use them)


Thanks in advance,

Siegfried Goeschl

PS: Can/shall I merge the PR to bring in `freemarker-cli`?


On 28 Feb 2020, at 9:14, Daniel Dekany wrote:

> Yeah, "data source" is surely a too popular name, but for reason. 
> Anyone
> has other ideas?
>
> As of naming data sources and such. One thing I was wondering about 
> back
> then is how to deal with list of documents given to a template, versus
> exactly 1 document given to a template. But I think actually we have 
> no
> good use case for list of documents that's passed at once to a single
> template run, so, we can just ignore that complication. A document has 
> a
> name, and that's always just a single document, not a collection, as 
> far as
> the template is concerned. (We can have multiple documents per run, 
> but
> those normally yield separate output generators, so it's still only 
> one
> document per template.) However, we can have data source types 
> (document
> types with old terminology) that collect together multiple data files. 
> So
> then that complexity is encapsulated into the data source type, and 
> doesn't
> complicate the overall architecture. That's another case when a data 
> source
> is not just a file. Like maybe there's a data source type that loads 
> all
> the CSV-s from a directory, into a single big table (I had such case), 
> or
> even into a list of tables. Or, as I mentioned already, a data source 
> is
> maybe an SQL query on a JDBC data source (and we got the first term
> clash... JDBC also call them data sources).
>
> Template and document mode probably shouldn't exist from user 
> perspective
> either, at least not as a global option that must apply to everything 
> in a
> run. They could just give the files that define the "output 
> generators",
> and some of them will be templates, some of them are data files, in 
> which
> case a template need to be associated with them (and there can be a 
> couple
> of ways of doing that). And then again, there are the cases where you 
> want
> to create one output generator per entity from some data source.
>
> On Fri, Feb 28, 2020 at 8:23 AM Siegfried Goeschl <
> siegfried.goeschl@gmail.com> wrote:
>
>> Hi Daniel,
>>
>> See my comments below - and thanks for your patience and input :-)
>>
>> *Renaming Document To DataSource*
>>
>> Yes, makes sense. I tried to avoid since I'm using javax.activation 
>> and
>> its DataSource.
>>
>> *Template And Document Mode*
>>
>> Agreed - I think it is a valuable abstraction for the user but it is 
>> not
>> an implementation concept :-)
>>
>> *Document Without Symbolic Names*
>>
>> Also agreed and it is going to change but I have not settled my mind 
>> yet
>> what exactly to implement.
>>
>> Thanks in advance,
>>
>> Siegfried Goeschl
>>
>> On 28 Feb 2020, at 1:05, Daniel Dekany wrote:
>>
>> A few quick thoughts on that:
>>
>> - We should replace the "document" term with something more speaking. 
>> It
>> doesn't tell that it's some kind of input. Also, most of these inputs
>> aren't something that people typically call documents. Like a csv 
>> file, or
>> a database table, which is not even a file (OK we don't support such 
>> thing
>> at the moment). I think, maybe "data source" is a safe enough term. 
>> (It
>> also rhymes with data model.)
>> - You have separate "template" and "document" "mode", that applies to 
>> a
>> whole run. I think such specialization won't be helpful. We could 
>> just say,
>> on the conceptual level at lest, that we need a set of "outputs
>> generators". An output generator is an object (in the API) that 
>> specifies a
>> template, a data-model (where the data-model is possibly populated 
>> with
>> "documents"), and an output "sink" (a file path, or stdout), and can
>> generate the output itself. A practical way of defining the output
>> generators in a CLI application is via a bunch of files, each 
>> defining an
>> output generator. Some of those files is maybe a template (that you 
>> can
>> even detect from the file extension), or a data file that we 
>> currently call
>> a "document". They could freely mix inside the same run. I have also 
>> met
>> use case when you have a single table (single "document"), and each 
>> record
>> in it yields an output file. That can also be described in some file
>> format, or really in any other way, like directly in command line 
>> argument,
>> via API, etc.
>> - You have multiple documents without associated symbolical name in 
>> some
>> examples. Templates can't identify those then in a well maintainable 
>> way.
>> The actual file name is often not a good identifier, can change over 
>> time,
>> and you might don't even have good control over it, like you already
>> receive it as a parameter from somewhere else, or someone 
>> moves/renames
>> that files that you need to read. Index is also not very good, but I 
>> have
>> written about that earlier.
>>
>>
>> On Wed, Feb 26, 2020 at 9:33 AM Siegfried Goeschl <
>> siegfried.goeschl@gmail.com> wrote:
>>
>> Hi folks,
>>
>> still wrapping my side around but assembled some thoughts here -
>> https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449
>>
>> Thanks in advance,
>>
>> Siegfried Goeschl
>>
>>
>> On 23 Feb 2020, at 23:14, Daniel Dekany <dd...@apache.org> wrote:
>>
>> What you are describing is more like the angle that FMPP took 
>> initially,
>> where templates drive things, they generate the output for themselves
>>
>> (even
>>
>> multiple output files if they wish). By default output files name 
>> (and
>> relative path) is deduced from template name. There was also a global
>> data-model, built in a configuration file (or equally, built via 
>> command
>> line arguments, or both mixed), from which templates get whatever 
>> data
>>
>> they
>>
>> are interested in. Take a look at the figures here:
>> http://fmpp.sourceforge.net/qtour.html. Later, this concept was
>>
>> generalized
>>
>> a bit more, because you could add XML files at the same place where 
>> you
>> have the templates, and then you could associate transform templates 
>> to
>>
>> the
>>
>> XML files (based on path pattern and/or the XML document element). 
>> Now
>> that's like what freemarker-generator had initially (data files drive
>> output, and the template is there to transform it).
>>
>> So I think the generic mental model would like this:
>>
>> 1. You got files that drive the process, let's call them *generator
>> files* for now. Usually, each generator file yields an output file 
>> (but
>> maybe even multiple output files, as you might saw in the last 
>> figure).
>> These generator files can be of many types, like XML, JSON, XLSX (as
>>
>> in the
>>
>> original freemarker-generator), and even templates (as is the norm in
>> FMPP). If the file is not a template, then you got a set of 
>> transformer
>> templates (-t CLI option) in a separate directory, which can be
>>
>> associated
>>
>> with the generator files base on name patterns, and even based on
>>
>> content
>>
>> (schema usually). If the generator file is a template (so that's a
>> positional @Parameter CLI argument that happens to be an *.ftl, and 
>> is
>>
>> not
>>
>> a template file specified after the "-t" option), then you just
>> Template.process(...) it, and it prints what the output will be.
>> 2. You also have a set of variables, the global data-model, that
>> contains commonly useful stuff, like what you now call parameters 
>> (CLI
>> -Pname=value), but also maybe data loaded from JSON, XML, etc.. Those
>>
>> data
>>
>> files aren't "generator files". Templates just use them if they need
>>
>> them.
>>
>> An important thing here is to reuse the same mechanism to read and
>>
>> parse
>>
>> those data files, which was used in templates when transforming
>>
>> generator
>>
>> files. So we need a common format for specifying how to load data
>>
>> files.
>>
>> That's maybe just FTL that #assigns to the variables, or maybe more
>> declarative format.
>>
>> What I have described in the original post here was a less generic 
>> form
>>
>> of
>>
>> this, as I tried to be true with the original approach. I though the
>> proposal will be drastic enough as it is... :) There, the "main" 
>> document
>> is the "generator file" from point 1, the "-t" template is the 
>> transform
>> template for the "main" document, and the other named documents 
>> ("users",
>> "groups") is a poor man's shared data-model from point 2 (together 
>> with
>> with -PName=value).
>>
>> There's further somewhat confusing thing to get right with the
>> list-of-documents (`DocuentList`, `NamedDocumentLists`) thing though. 
>> In
>> the model above, as per point 1, if you list multiple data files, 
>> each
>>
>> will
>>
>> generate a separate output file. So, if you need take in a list of 
>> files
>>
>> to
>>
>> transform it to a single output file (or at least with a single 
>> transform
>> template execution), then you have to be explicit about that, as 
>> that's
>>
>> not
>>
>> the default behavior anymore. But it's still absolutely possible. 
>> Imagine
>> it as a "list of XLSX-es" is itself like a file format. You need some 
>> CLI
>> (and Maven config, etc.) syntax to express that, but that shouldn't 
>> be a
>> big deal.
>>
>>
>>
>> On Sun, Feb 23, 2020 at 9:43 PM Siegfried Goeschl <
>> siegfried.goeschl@gmail.com> wrote:
>>
>> Hi Daniel,
>>
>> Good timing - I was looking at a similar problem from different angle
>> yesterday (see below)
>>
>> Don't have enough time to answer your email in detail now - will do 
>> that
>> tomorrow evening
>>
>> Thanks in advance,
>>
>> Siegfried Goeschl
>>
>>
>> ===. START
>> # FreeMarker CLI Improvement
>> ## Support Of Multiple Template Files
>> Currently we support the following combinations
>>
>> * Single template and no data files
>> * Single template and one or more data files
>>
>> But we can not support the following use case which is quite typical 
>> in
>> the cloud
>>
>> __Convert multiple templates with a single data file, e.g copying a
>> directory of configuration files using a JSON configuration file__
>>
>> ## Implementation notes
>> * When we copy a directory we can remove the `ftl`extension on the 
>> fly
>> * We might need an `exclude` filter for the copy operation
>> * Initially resolve to a list of template files and process one after
>> another
>> * Need to calculate the output file location and extension
>> * We need to rename the existing command line parameters (see below)
>> * Do we need multiple include and exclude filter?
>> * Do we need file versus directory filters?
>>
>> ### Command Line Options
>> ```
>> --input-encoding : Encoding of the documents
>> --output-encoding : Encoding of the rendered template
>> --template-encoding : Encoding of the template
>> --output : Output file or directory
>> --include-document : Include pattern for documents
>> --exclude-document : Exclude pattern for documents
>> --include-template: Include pattern for templates
>> --exclude-template : Exclude pattern for templates
>> ```
>>
>> ### Command Line Examples
>> ```text
>> # Copy all FTL templates found in "ext/config" to the "/config"
>>
>> directory
>>
>> using the data from "config.json"
>>
>> freemarker-cli -t ./ext/config --include-template *.ftl --o /config
>>
>> config.json
>>
>> freemarker-cli --template ./ext/config --include-template *.ftl
>>
>> --output
>>
>> /config config.json
>>
>> # Bascically the same using a named document "configuration"
>> # It might make sense to expose "conf" directly in the FreeMarker 
>> data
>> model
>> # It might make sens to allow URIs for loading documents
>>
>> freemarker-cli -t ./ext/config/*.ftl -o /config -d
>>
>> configuration=config.json
>>
>> freemarker-cli --template ./ext/config --include-template *.ftl
>>
>> --output
>>
>> /config --document configuration=config.json
>>
>> freemarker-cli --template ./ext/config --include-template *.ftl
>>
>> --output
>>
>> /config --document configuration=file:///config.json
>>
>> # Bascically the same using an environment variable as named document
>>
>> freemarker-cli -t ./ext/config --include-template *.ftl -o /config -d
>>
>> configuration=env:///CONFIGURATION
>>
>> freemarker-cli --template ./ext/config --include-template *.ftl
>>
>> --output
>>
>> /config --document configuration=env:///CONFIGURATION
>> ```
>> === END
>>
>> On 23.02.2020, at 16:37, Daniel Dekany <dd...@apache.org> wrote:
>>
>> Input documents is a fundamental concept in freemarker-generator, so 
>> we
>> should think about that more, and probably refine/rework how it's 
>> done.
>>
>> Currently it works like this, with CLI at least.
>>
>> freemarker-cli
>> -t access-report.ftl
>> somewhere/foo-access-log.csv
>>
>> Then in access-report.ftl you have to do something like this:
>>
>> <#assign doc = Documents.get(0)>
>> ... process doc here
>>
>> (The more idiomatic Documents[0] won't work. Actually, that lead to a
>>
>> funny
>>
>> chain of coincidences: It returned the string "D", then
>>
>> CSVTool.parse(...)
>>
>> happily parsed that to a table with the single column "D", and 0 
>> rows,
>>
>> and
>>
>> as there were 0 rows, the template didn't run into an error because
>> row.myExpectedColumn refers to a missing column either, so the 
>> process
>> finished with success. (: Pretty unlucky for sure. The root was
>> unintentionally breaking a FreeMarker idiom though; eventually we 
>> will
>>
>> have
>>
>> to work on those too, but, different topic.)
>>
>> However, actually multiple input documents can be passed in:
>>
>> freemarker-cli
>> -t access-report.ftl
>> somewhere/foo-access-log.csv
>> somewhere/bar-access-log.csv
>>
>> Above template will still work, though then you ignored all but the
>>
>> first
>>
>> document. So if you expect any number of input documents, you 
>> probably
>>
>> will
>>
>> have to do this:
>>
>> <#list Documents.list as doc>
>> ... process doc here
>> </#list>
>>
>> (The more idiomatic <#list Documents as doc> won't work; but again,
>>
>> those
>>
>> we will work out in a different thread.)
>>
>>
>> So, what would be better, in my opinion. I start out from what I 
>> think
>>
>> are
>>
>> the common uses cases, in decreasing order of frequency. Goal is to
>>
>> make
>>
>> those less error prone for the users, and simpler to express.
>>
>> USE CASE 1
>>
>> You have exactly 1 input documents, which is therefore simply "the"
>> document in the mind of the user. This is probably the typical use
>>
>> case,
>>
>> but at least the use case users typically start out from when 
>> starting
>>
>> the
>>
>> work.
>>
>> freemarker-cli
>> -t access-report.ftl
>> somewhere/foo-access-log.csv
>>
>> Then `Documents.get(0)` is not very fitting. Most importantly it's
>>
>> error
>>
>> prone, because if the user passed in more than 1 documents (can even
>>
>> happen
>>
>> totally accidentally, like if the user was lazy and used a wildcard
>>
>> that
>>
>> the shell exploded), the template will silently ignore the rest of 
>> the
>> documents, and the singe document processed will be practically 
>> picked
>> randomly. The user might won't notice that and submits a bad report 
>> or
>>
>> such.
>>
>> I think that in this use case the document should be simply referred 
>> as
>> `Document` in the template. When you have multiple documents there,
>> referring to `Document` should be an error, saying that the template
>>
>> was
>>
>> made to process a single document only.
>>
>>
>> USE CASE 2
>>
>> You have multiple input documents, but each has different role
>>
>> (different
>>
>> schema, maybe different file type). Like, you pass in users.csv and
>> groups.csv. Each has difference schema, and so you want to access 
>> them
>> differently, but in the same template.
>>
>> freemarker-cli
>> [...]
>> --named-document users somewhere/foo-users.csv
>> --named-document groups somewhere/foo-groups.csv
>>
>> Then in the template you could refer to them as:
>>
>> `NamedDocuments.users`,
>>
>> and `NamedDocuments.groups`.
>>
>> Use Case 1, and 2 can be unified into a coherent concept, where
>>
>> `Document`
>>
>> is just a shorthand for `NamedDocuments.main`. It's called "main"
>>
>> because
>>
>> that's "the" document the template is about, but then you have to 
>> added
>> some helper documents, with symbolic names representing their role.
>>
>> freemarker-cli
>> -t access-report.ftl
>> --document-name=main somewhere/foo-access-log.csv
>> --document-name=users somewhere/foo-users.csv
>> --document-name=groups somewhere/foo-groups.csv
>>
>> Here, `Document` still works in the template, and it refers to
>> `somewhere/foo-access-log.csv`. (While omitting --document-name=main
>>
>> above
>>
>> would be cleaner, I couldn't figure out how to do that with Picocli.
>> Anyway, for now the point is the concept, which is not specific to
>>
>> CLI.)
>>
>> USE CASE 3
>>
>> Here you have several of the same kind of documents. That has a more
>> generic sub-use-case, when you have explicitly named documents (like
>> "users" above), and for some you expect multiple input files.
>>
>> freemarker-cli
>> -t access-report.ftl
>> --document-name=main somewhere/foo-access-log.csv
>> somewhere/bar-access-log.csv
>> --document-name=users somewhere/foo-users.csv
>> somewhere/bar-users.csv
>> --document-name=groups somewhere/global-groups.csv
>>
>> The template must to be written with this use case in mind, as now it
>>
>> has
>>
>> #list some of the documents. (I think in practice you hardly ever 
>> want
>>
>> to
>>
>> get a document by hard coded index. Either you don't know how many
>> documents you have, so you can't use hard coded indexes, or you do, 
>> and
>> each index has a specific meaning, but then you should name the
>>
>> documents
>>
>> instead, as using indexes is error prone, and hard to read.)
>> Accessing that list of documents in the template, maybe could be done
>>
>> like
>>
>> this:
>> - For the "main" documents: `DocumentList`
>> - For explicitly named documents, like "users":
>>
>> `NamedDocumentLists.users`
>>
>> SUMMING UP
>>
>> To unify all 3 use cases into a coherent concept:
>> - `NamedDocumentLists.<name>` is the most generic form, and while you
>>
>> can
>>
>> achieve everything with it, using it requires your template to handle
>>
>> the
>>
>> most generic case too. So, I think it would be rarely used.
>> - `DocumentList` is just a shorthand for `NamedDocumentLists.main`.
>>
>> It's
>>
>> used if you only have one kind of documents (single format and 
>> schema),
>>
>> but
>>
>> potentially multiple of them.
>> - `NamedDocuments.<name>` expresses that you expect exactly 1 
>> document
>>
>> of
>>
>> the given name.
>> - `Document` is just a shorthand for `NamedDocuments.main`. This is 
>> for
>>
>> the
>>
>> most natural/frequent use case.
>>
>> That's 4 possible ways of accessing your documents, which is a
>>
>> trade-off
>>
>> for the sake of these:
>> - Catching CLI (or Maven, etc.) input where the template output 
>> likely
>>
>> will
>>
>> be wrong. That's only possible if the user can communicate its intent
>>
>> in
>>
>> the template.
>> - Users don't need to deal with concepts that are irrelevant in their
>> concrete use case. Just start with the trivial, `Document`, and later
>>
>> if
>>
>> the need arises, generalize to named documents, document lists, or
>>
>> both.
>>
>> What do guys think?
>>
>>

Re: freemarker-generator: Improving the input documents concept

Posted by Daniel Dekany <dd...@apache.org>.

Yeah, "data source" is surely a too popular name, but for reason. Anyone
has other ideas?

As of naming data sources and such. One thing I was wondering about back
then is how to deal with list of documents given to a template, versus
exactly 1 document given to a template. But I think actually we have no
good use case for list of documents that's passed at once to a single
template run, so, we can just ignore that complication. A document has a
name, and that's always just a single document, not a collection, as far as
the template is concerned. (We can have multiple documents per run, but
those normally yield separate output generators, so it's still only one
document per template.) However, we can have data source types (document
types with old terminology) that collect together multiple data files. So
then that complexity is encapsulated into the data source type, and doesn't
complicate the overall architecture. That's another case when a data source
is not just a file. Like maybe there's a data source type that loads all
the CSV-s from a directory, into a single big table (I had such case), or
even into a list of tables. Or, as I mentioned already, a data source is
maybe an SQL query on a JDBC data source (and we got the first term
clash... JDBC also call them data sources).

Template and document mode probably shouldn't exist from user perspective
either, at least not as a global option that must apply to everything in a
run. They could just give the files that define the "output generators",
and some of them will be templates, some of them are data files, in which
case a template need to be associated with them (and there can be a couple
of ways of doing that). And then again, there are the cases where you want
to create one output generator per entity from some data source.

On Fri, Feb 28, 2020 at 8:23 AM Siegfried Goeschl <
siegfried.goeschl@gmail.com> wrote:

> Hi Daniel,
>
> See my comments below - and thanks for your patience and input :-)
>
> *Renaming Document To DataSource*
>
> Yes, makes sense. I tried to avoid since I'm using javax.activation and
> its DataSource.
>
> *Template And Document Mode*
>
> Agreed - I think it is a valuable abstraction for the user but it is not
> an implementation concept :-)
>
> *Document Without Symbolic Names*
>
> Also agreed and it is going to change but I have not settled my mind yet
> what exactly to implement.
>
> Thanks in advance,
>
> Siegfried Goeschl
>
> On 28 Feb 2020, at 1:05, Daniel Dekany wrote:
>
> A few quick thoughts on that:
>
> - We should replace the "document" term with something more speaking. It
> doesn't tell that it's some kind of input. Also, most of these inputs
> aren't something that people typically call documents. Like a csv file, or
> a database table, which is not even a file (OK we don't support such thing
> at the moment). I think, maybe "data source" is a safe enough term. (It
> also rhymes with data model.)
> - You have separate "template" and "document" "mode", that applies to a
> whole run. I think such specialization won't be helpful. We could just say,
> on the conceptual level at lest, that we need a set of "outputs
> generators". An output generator is an object (in the API) that specifies a
> template, a data-model (where the data-model is possibly populated with
> "documents"), and an output "sink" (a file path, or stdout), and can
> generate the output itself. A practical way of defining the output
> generators in a CLI application is via a bunch of files, each defining an
> output generator. Some of those files is maybe a template (that you can
> even detect from the file extension), or a data file that we currently call
> a "document". They could freely mix inside the same run. I have also met
> use case when you have a single table (single "document"), and each record
> in it yields an output file. That can also be described in some file
> format, or really in any other way, like directly in command line argument,
> via API, etc.
> - You have multiple documents without associated symbolical name in some
> examples. Templates can't identify those then in a well maintainable way.
> The actual file name is often not a good identifier, can change over time,
> and you might don't even have good control over it, like you already
> receive it as a parameter from somewhere else, or someone moves/renames
> that files that you need to read. Index is also not very good, but I have
> written about that earlier.
>
>
> On Wed, Feb 26, 2020 at 9:33 AM Siegfried Goeschl <
> siegfried.goeschl@gmail.com> wrote:
>
> Hi folks,
>
> still wrapping my side around but assembled some thoughts here -
> https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449
>
> Thanks in advance,
>
> Siegfried Goeschl
>
>
> On 23 Feb 2020, at 23:14, Daniel Dekany <dd...@apache.org> wrote:
>
> What you are describing is more like the angle that FMPP took initially,
> where templates drive things, they generate the output for themselves
>
> (even
>
> multiple output files if they wish). By default output files name (and
> relative path) is deduced from template name. There was also a global
> data-model, built in a configuration file (or equally, built via command
> line arguments, or both mixed), from which templates get whatever data
>
> they
>
> are interested in. Take a look at the figures here:
> http://fmpp.sourceforge.net/qtour.html. Later, this concept was
>
> generalized
>
> a bit more, because you could add XML files at the same place where you
> have the templates, and then you could associate transform templates to
>
> the
>
> XML files (based on path pattern and/or the XML document element). Now
> that's like what freemarker-generator had initially (data files drive
> output, and the template is there to transform it).
>
> So I think the generic mental model would like this:
>
> 1. You got files that drive the process, let's call them *generator
> files* for now. Usually, each generator file yields an output file (but
> maybe even multiple output files, as you might saw in the last figure).
> These generator files can be of many types, like XML, JSON, XLSX (as
>
> in the
>
> original freemarker-generator), and even templates (as is the norm in
> FMPP). If the file is not a template, then you got a set of transformer
> templates (-t CLI option) in a separate directory, which can be
>
> associated
>
> with the generator files base on name patterns, and even based on
>
> content
>
> (schema usually). If the generator file is a template (so that's a
> positional @Parameter CLI argument that happens to be an *.ftl, and is
>
> not
>
> a template file specified after the "-t" option), then you just
> Template.process(...) it, and it prints what the output will be.
> 2. You also have a set of variables, the global data-model, that
> contains commonly useful stuff, like what you now call parameters (CLI
> -Pname=value), but also maybe data loaded from JSON, XML, etc.. Those
>
> data
>
> files aren't "generator files". Templates just use them if they need
>
> them.
>
> An important thing here is to reuse the same mechanism to read and
>
> parse
>
> those data files, which was used in templates when transforming
>
> generator
>
> files. So we need a common format for specifying how to load data
>
> files.
>
> That's maybe just FTL that #assigns to the variables, or maybe more
> declarative format.
>
> What I have described in the original post here was a less generic form
>
> of
>
> this, as I tried to be true with the original approach. I though the
> proposal will be drastic enough as it is... :) There, the "main" document
> is the "generator file" from point 1, the "-t" template is the transform
> template for the "main" document, and the other named documents ("users",
> "groups") is a poor man's shared data-model from point 2 (together with
> with -PName=value).
>
> There's further somewhat confusing thing to get right with the
> list-of-documents (`DocuentList`, `NamedDocumentLists`) thing though. In
> the model above, as per point 1, if you list multiple data files, each
>
> will
>
> generate a separate output file. So, if you need take in a list of files
>
> to
>
> transform it to a single output file (or at least with a single transform
> template execution), then you have to be explicit about that, as that's
>
> not
>
> the default behavior anymore. But it's still absolutely possible. Imagine
> it as a "list of XLSX-es" is itself like a file format. You need some CLI
> (and Maven config, etc.) syntax to express that, but that shouldn't be a
> big deal.
>
>
>
> On Sun, Feb 23, 2020 at 9:43 PM Siegfried Goeschl <
> siegfried.goeschl@gmail.com> wrote:
>
> Hi Daniel,
>
> Good timing - I was looking at a similar problem from different angle
> yesterday (see below)
>
> Don't have enough time to answer your email in detail now - will do that
> tomorrow evening
>
> Thanks in advance,
>
> Siegfried Goeschl
>
>
> ===. START
> # FreeMarker CLI Improvement
> ## Support Of Multiple Template Files
> Currently we support the following combinations
>
> * Single template and no data files
> * Single template and one or more data files
>
> But we can not support the following use case which is quite typical in
> the cloud
>
> __Convert multiple templates with a single data file, e.g copying a
> directory of configuration files using a JSON configuration file__
>
> ## Implementation notes
> * When we copy a directory we can remove the `ftl`extension on the fly
> * We might need an `exclude` filter for the copy operation
> * Initially resolve to a list of template files and process one after
> another
> * Need to calculate the output file location and extension
> * We need to rename the existing command line parameters (see below)
> * Do we need multiple include and exclude filter?
> * Do we need file versus directory filters?
>
> ### Command Line Options
> ```
> --input-encoding : Encoding of the documents
> --output-encoding : Encoding of the rendered template
> --template-encoding : Encoding of the template
> --output : Output file or directory
> --include-document : Include pattern for documents
> --exclude-document : Exclude pattern for documents
> --include-template: Include pattern for templates
> --exclude-template : Exclude pattern for templates
> ```
>
> ### Command Line Examples
> ```text
> # Copy all FTL templates found in "ext/config" to the "/config"
>
> directory
>
> using the data from "config.json"
>
> freemarker-cli -t ./ext/config --include-template *.ftl --o /config
>
> config.json
>
> freemarker-cli --template ./ext/config --include-template *.ftl
>
> --output
>
> /config config.json
>
> # Bascically the same using a named document "configuration"
> # It might make sense to expose "conf" directly in the FreeMarker data
> model
> # It might make sens to allow URIs for loading documents
>
> freemarker-cli -t ./ext/config/*.ftl -o /config -d
>
> configuration=config.json
>
> freemarker-cli --template ./ext/config --include-template *.ftl
>
> --output
>
> /config --document configuration=config.json
>
> freemarker-cli --template ./ext/config --include-template *.ftl
>
> --output
>
> /config --document configuration=file:///config.json
>
> # Bascically the same using an environment variable as named document
>
> freemarker-cli -t ./ext/config --include-template *.ftl -o /config -d
>
> configuration=env:///CONFIGURATION
>
> freemarker-cli --template ./ext/config --include-template *.ftl
>
> --output
>
> /config --document configuration=env:///CONFIGURATION
> ```
> === END
>
> On 23.02.2020, at 16:37, Daniel Dekany <dd...@apache.org> wrote:
>
> Input documents is a fundamental concept in freemarker-generator, so we
> should think about that more, and probably refine/rework how it's done.
>
> Currently it works like this, with CLI at least.
>
> freemarker-cli
> -t access-report.ftl
> somewhere/foo-access-log.csv
>
> Then in access-report.ftl you have to do something like this:
>
> <#assign doc = Documents.get(0)>
> ... process doc here
>
> (The more idiomatic Documents[0] won't work. Actually, that lead to a
>
> funny
>
> chain of coincidences: It returned the string "D", then
>
> CSVTool.parse(...)
>
> happily parsed that to a table with the single column "D", and 0 rows,
>
> and
>
> as there were 0 rows, the template didn't run into an error because
> row.myExpectedColumn refers to a missing column either, so the process
> finished with success. (: Pretty unlucky for sure. The root was
> unintentionally breaking a FreeMarker idiom though; eventually we will
>
> have
>
> to work on those too, but, different topic.)
>
> However, actually multiple input documents can be passed in:
>
> freemarker-cli
> -t access-report.ftl
> somewhere/foo-access-log.csv
> somewhere/bar-access-log.csv
>
> Above template will still work, though then you ignored all but the
>
> first
>
> document. So if you expect any number of input documents, you probably
>
> will
>
> have to do this:
>
> <#list Documents.list as doc>
> ... process doc here
> </#list>
>
> (The more idiomatic <#list Documents as doc> won't work; but again,
>
> those
>
> we will work out in a different thread.)
>
>
> So, what would be better, in my opinion. I start out from what I think
>
> are
>
> the common uses cases, in decreasing order of frequency. Goal is to
>
> make
>
> those less error prone for the users, and simpler to express.
>
> USE CASE 1
>
> You have exactly 1 input documents, which is therefore simply "the"
> document in the mind of the user. This is probably the typical use
>
> case,
>
> but at least the use case users typically start out from when starting
>
> the
>
> work.
>
> freemarker-cli
> -t access-report.ftl
> somewhere/foo-access-log.csv
>
> Then `Documents.get(0)` is not very fitting. Most importantly it's
>
> error
>
> prone, because if the user passed in more than 1 documents (can even
>
> happen
>
> totally accidentally, like if the user was lazy and used a wildcard
>
> that
>
> the shell exploded), the template will silently ignore the rest of the
> documents, and the singe document processed will be practically picked
> randomly. The user might won't notice that and submits a bad report or
>
> such.
>
> I think that in this use case the document should be simply referred as
> `Document` in the template. When you have multiple documents there,
> referring to `Document` should be an error, saying that the template
>
> was
>
> made to process a single document only.
>
>
> USE CASE 2
>
> You have multiple input documents, but each has different role
>
> (different
>
> schema, maybe different file type). Like, you pass in users.csv and
> groups.csv. Each has difference schema, and so you want to access them
> differently, but in the same template.
>
> freemarker-cli
> [...]
> --named-document users somewhere/foo-users.csv
> --named-document groups somewhere/foo-groups.csv
>
> Then in the template you could refer to them as:
>
> `NamedDocuments.users`,
>
> and `NamedDocuments.groups`.
>
> Use Case 1, and 2 can be unified into a coherent concept, where
>
> `Document`
>
> is just a shorthand for `NamedDocuments.main`. It's called "main"
>
> because
>
> that's "the" document the template is about, but then you have to added
> some helper documents, with symbolic names representing their role.
>
> freemarker-cli
> -t access-report.ftl
> --document-name=main somewhere/foo-access-log.csv
> --document-name=users somewhere/foo-users.csv
> --document-name=groups somewhere/foo-groups.csv
>
> Here, `Document` still works in the template, and it refers to
> `somewhere/foo-access-log.csv`. (While omitting --document-name=main
>
> above
>
> would be cleaner, I couldn't figure out how to do that with Picocli.
> Anyway, for now the point is the concept, which is not specific to
>
> CLI.)
>
> USE CASE 3
>
> Here you have several of the same kind of documents. That has a more
> generic sub-use-case, when you have explicitly named documents (like
> "users" above), and for some you expect multiple input files.
>
> freemarker-cli
> -t access-report.ftl
> --document-name=main somewhere/foo-access-log.csv
> somewhere/bar-access-log.csv
> --document-name=users somewhere/foo-users.csv
> somewhere/bar-users.csv
> --document-name=groups somewhere/global-groups.csv
>
> The template must to be written with this use case in mind, as now it
>
> has
>
> #list some of the documents. (I think in practice you hardly ever want
>
> to
>
> get a document by hard coded index. Either you don't know how many
> documents you have, so you can't use hard coded indexes, or you do, and
> each index has a specific meaning, but then you should name the
>
> documents
>
> instead, as using indexes is error prone, and hard to read.)
> Accessing that list of documents in the template, maybe could be done
>
> like
>
> this:
> - For the "main" documents: `DocumentList`
> - For explicitly named documents, like "users":
>
> `NamedDocumentLists.users`
>
> SUMMING UP
>
> To unify all 3 use cases into a coherent concept:
> - `NamedDocumentLists.<name>` is the most generic form, and while you
>
> can
>
> achieve everything with it, using it requires your template to handle
>
> the
>
> most generic case too. So, I think it would be rarely used.
> - `DocumentList` is just a shorthand for `NamedDocumentLists.main`.
>
> It's
>
> used if you only have one kind of documents (single format and schema),
>
> but
>
> potentially multiple of them.
> - `NamedDocuments.<name>` expresses that you expect exactly 1 document
>
> of
>
> the given name.
> - `Document` is just a shorthand for `NamedDocuments.main`. This is for
>
> the
>
> most natural/frequent use case.
>
> That's 4 possible ways of accessing your documents, which is a
>
> trade-off
>
> for the sake of these:
> - Catching CLI (or Maven, etc.) input where the template output likely
>
> will
>
> be wrong. That's only possible if the user can communicate its intent
>
> in
>
> the template.
> - Users don't need to deal with concepts that are irrelevant in their
> concrete use case. Just start with the trivial, `Document`, and later
>
> if
>
> the need arises, generalize to named documents, document lists, or
>
> both.
>
> What do guys think?
>
>

Re: freemarker-generator: Improving the input documents concept

Posted by Siegfried Goeschl <si...@gmail.com>.

Hi Daniel,

See my comments below - and thanks for your patience and input :-)

__Renaming Document To DataSource__

Yes, makes sense. I tried to avoid since I'm using `javax.activation` 
and its `DataSource`.

__Template And Document Mode__

Agreed - I think it is a valuable abstraction for the user but it is not 
an implementation concept :-)

__Document Without Symbolic Names__

Also agreed and it is going to change but I have not settled my mind yet 
what exactly to implement.


Thanks in advance,

Siegfried Goeschl


On 28 Feb 2020, at 1:05, Daniel Dekany wrote:

> A few quick thoughts on that:
>
>    - We should replace the "document" term with something more 
> speaking. It
>    doesn't tell that it's some kind of input. Also, most of these 
> inputs
>    aren't something that people typically call documents. Like a csv 
> file, or
>    a database table, which is not even a file (OK we don't support 
> such thing
>    at the moment). I think, maybe "data source" is a safe enough term. 
> (It
>    also rhymes with data model.)
>    - You have separate "template" and "document" "mode", that applies 
> to a
>    whole run. I think such specialization won't be helpful. We could 
> just say,
>    on the conceptual level at lest, that we need a set of "outputs
>    generators". An output generator is an object (in the API) that 
> specifies a
>    template, a data-model (where the data-model is possibly populated 
> with
>    "documents"), and an output "sink" (a file path, or stdout), and 
> can
>    generate the output itself. A practical way of defining the output
>    generators in a CLI application is via a bunch of files, each 
> defining an
>    output generator. Some of those files is maybe a template (that you 
> can
>    even detect from the file extension), or a data file that we 
> currently call
>    a "document". They could freely mix inside the same run. I have 
> also met
>    use case when you have a single table (single "document"), and each 
> record
>    in it yields an output file. That can also be described in some 
> file
>    format, or really in any other way, like directly in command line 
> argument,
>    via API, etc.
>    - You have multiple documents without associated symbolical name in 
> some
>    examples. Templates can't identify those then in a well 
> maintainable way.
>    The actual file name is often not a good identifier, can change 
> over time,
>    and you might don't even have good control over it, like you 
> already
>    receive it as a parameter from somewhere else, or someone 
> moves/renames
>    that files that you need to read. Index is also not very good, but 
> I have
>    written about that earlier.
>
>
> On Wed, Feb 26, 2020 at 9:33 AM Siegfried Goeschl <
> siegfried.goeschl@gmail.com> wrote:
>
>> Hi folks,
>>
>> still wrapping my side around but assembled some thoughts here -
>> https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449
>>
>> Thanks in advance,
>>
>> Siegfried Goeschl
>>
>>
>>
>>> On 23 Feb 2020, at 23:14, Daniel Dekany <dd...@apache.org> wrote:
>>>
>>> What you are describing is more like the angle that FMPP took 
>>> initially,
>>> where templates drive things, they generate the output for 
>>> themselves
>> (even
>>> multiple output files if they wish). By default output files name 
>>> (and
>>> relative path) is deduced from template name. There was also a 
>>> global
>>> data-model, built in a configuration file (or equally, built via 
>>> command
>>> line arguments, or both mixed), from which templates get whatever 
>>> data
>> they
>>> are interested in. Take a look at the figures here:
>>> http://fmpp.sourceforge.net/qtour.html. Later, this concept was
>> generalized
>>> a bit more, because you could add XML files at the same place where 
>>> you
>>> have the templates, and then you could associate transform templates 
>>> to
>> the
>>> XML files (based on path pattern and/or the XML document element). 
>>> Now
>>> that's like what freemarker-generator had initially (data files 
>>> drive
>>> output, and the template is there to transform it).
>>>
>>> So I think the generic mental model would like this:
>>>
>>>   1. You got files that drive the process, let's call them 
>>> *generator
>>>   files* for now. Usually, each generator file yields an output file 
>>> (but
>>>   maybe even multiple output files, as you might saw in the last 
>>> figure).
>>>   These generator files can be of many types, like XML, JSON, XLSX 
>>> (as
>> in the
>>>   original freemarker-generator), and even templates (as is the norm 
>>> in
>>>   FMPP). If the file is not a template, then you got a set of 
>>> transformer
>>>   templates (-t CLI option) in a separate directory, which can be
>> associated
>>>   with the generator files base on name patterns, and even based on
>> content
>>>   (schema usually). If the generator file is a template (so that's a
>>>   positional @Parameter CLI argument that happens to be an *.ftl, 
>>> and is
>> not
>>>   a template file specified after the "-t" option), then you just
>>>   Template.process(...) it, and it prints what the output will be.
>>>   2. You also have a set of variables, the global data-model, that
>>>   contains commonly useful stuff, like what you now call parameters 
>>> (CLI
>>>   -Pname=value), but also maybe data loaded from JSON, XML, etc.. 
>>> Those
>> data
>>>   files aren't "generator files". Templates just use them if they 
>>> need
>> them.
>>>   An important thing here is to reuse the same mechanism to read and
>> parse
>>>   those data files, which was used in templates when transforming
>> generator
>>>   files. So we need a common format for specifying how to load data
>> files.
>>>   That's maybe just FTL that #assigns to the variables, or maybe 
>>> more
>>>   declarative format.
>>>
>>> What I have described in the original post here was a less generic 
>>> form
>> of
>>> this, as I tried to be true with the original approach. I though the
>>> proposal will be drastic enough as it is... :) There, the "main" 
>>> document
>>> is the "generator file" from point 1, the "-t" template is the 
>>> transform
>>> template for the "main" document, and the other named documents 
>>> ("users",
>>> "groups") is a poor man's shared data-model from point 2 (together 
>>> with
>>> with -PName=value).
>>>
>>> There's further somewhat confusing thing to get right with the
>>> list-of-documents (`DocuentList`, `NamedDocumentLists`) thing 
>>> though. In
>>> the model above, as per point 1, if you list multiple data files, 
>>> each
>> will
>>> generate a separate output file. So, if you need take in a list of 
>>> files
>> to
>>> transform it to a single output file (or at least with a single 
>>> transform
>>> template execution), then you have to be explicit about that, as 
>>> that's
>> not
>>> the default behavior anymore. But it's still absolutely possible. 
>>> Imagine
>>> it as a "list of XLSX-es" is itself like a file format. You need 
>>> some CLI
>>> (and Maven config, etc.) syntax to express that, but that shouldn't 
>>> be a
>>> big deal.
>>>
>>>
>>>
>>> On Sun, Feb 23, 2020 at 9:43 PM Siegfried Goeschl <
>>> siegfried.goeschl@gmail.com> wrote:
>>>
>>>> Hi Daniel,
>>>>
>>>> Good timing - I was looking at a similar problem from different 
>>>> angle
>>>> yesterday (see below)
>>>>
>>>> Don't have enough time to answer your email in detail now - will do 
>>>> that
>>>> tomorrow evening
>>>>
>>>> Thanks in advance,
>>>>
>>>> Siegfried Goeschl
>>>>
>>>>
>>>> ===. START
>>>> # FreeMarker CLI Improvement
>>>> ## Support Of Multiple Template Files
>>>> Currently we support the following combinations
>>>>
>>>> * Single template and no data files
>>>> * Single template and one or more data files
>>>>
>>>> But we can not support the following use case which is quite 
>>>> typical in
>>>> the cloud
>>>>
>>>> __Convert multiple templates with a single data file, e.g copying a
>>>> directory of configuration files using a JSON configuration file__
>>>>
>>>> ## Implementation notes
>>>> * When we copy a directory we can remove the `ftl`extension on the 
>>>> fly
>>>> * We might need an `exclude` filter for the copy operation
>>>> * Initially resolve to a list of template files and process one 
>>>> after
>>>> another
>>>> * Need to calculate the output file location and extension
>>>> * We need to rename the existing command line parameters  (see 
>>>> below)
>>>> * Do we need multiple include and exclude filter?
>>>> * Do we need file versus directory filters?
>>>>
>>>> ### Command Line Options
>>>> ```
>>>> --input-encoding : Encoding of the documents
>>>> --output-encoding : Encoding of the rendered template
>>>> --template-encoding : Encoding of the template
>>>> --output : Output file or directory
>>>> --include-document : Include pattern for documents
>>>> --exclude-document : Exclude pattern for documents
>>>> --include-template: Include pattern for templates
>>>> --exclude-template : Exclude pattern for templates
>>>> ```
>>>>
>>>> ### Command Line Examples
>>>> ```text
>>>> # Copy all FTL templates found in "ext/config" to the "/config"
>> directory
>>>> using the data from "config.json"
>>>>> freemarker-cli -t ./ext/config --include-template *.ftl --o 
>>>>> /config
>>>> config.json
>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>> --output
>>>> /config config.json
>>>>
>>>> # Bascically the same using a named document "configuration"
>>>> # It might make sense to expose "conf" directly in the FreeMarker 
>>>> data
>>>> model
>>>> # It might make sens to allow URIs for loading documents
>>>>> freemarker-cli -t ./ext/config/*.ftl -o /config -d
>>>> configuration=config.json
>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>> --output
>>>> /config --document configuration=config.json
>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>> --output
>>>> /config --document configuration=file:///config.json
>>>>
>>>> # Bascically the same using an environment variable as named 
>>>> document
>>>>> freemarker-cli -t ./ext/config --include-template *.ftl -o /config 
>>>>> -d
>>>> configuration=env:///CONFIGURATION
>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>> --output
>>>> /config --document configuration=env:///CONFIGURATION
>>>> ```
>>>> === END
>>>>
>>>>
>>>>> On 23.02.2020, at 16:37, Daniel Dekany <dd...@apache.org> wrote:
>>>>>
>>>>> Input documents is a fundamental concept in freemarker-generator, 
>>>>> so we
>>>>> should think about that more, and probably refine/rework how it's 
>>>>> done.
>>>>>
>>>>> Currently it works like this, with CLI at least.
>>>>>
>>>>>   freemarker-cli
>>>>>       -t access-report.ftl
>>>>>       somewhere/foo-access-log.csv
>>>>>
>>>>> Then in access-report.ftl you have to do something like this:
>>>>>
>>>>>   <#assign doc = Documents.get(0)>
>>>>>   ... process doc here
>>>>>
>>>>> (The more idiomatic Documents[0] won't work. Actually, that lead 
>>>>> to a
>>>> funny
>>>>> chain of coincidences: It returned the string "D", then
>>>> CSVTool.parse(...)
>>>>> happily parsed that to a table with the single column "D", and 0 
>>>>> rows,
>>>> and
>>>>> as there were 0 rows, the template didn't run into an error 
>>>>> because
>>>>> row.myExpectedColumn refers to a missing column either, so the 
>>>>> process
>>>>> finished with success. (: Pretty unlucky for sure. The root was
>>>>> unintentionally breaking a FreeMarker idiom though; eventually we 
>>>>> will
>>>> have
>>>>> to work on those too, but, different topic.)
>>>>>
>>>>> However, actually multiple input documents can be passed in:
>>>>>
>>>>>   freemarker-cli
>>>>>       -t access-report.ftl
>>>>>       somewhere/foo-access-log.csv
>>>>>       somewhere/bar-access-log.csv
>>>>>
>>>>> Above template will still work, though then you ignored all but 
>>>>> the
>> first
>>>>> document. So if you expect any number of input documents, you 
>>>>> probably
>>>> will
>>>>> have to do this:
>>>>>
>>>>>   <#list Documents.list as doc>
>>>>>         ... process doc here
>>>>>   </#list>
>>>>>
>>>>> (The more idiomatic <#list Documents as doc> won't work; but 
>>>>> again,
>> those
>>>>> we will work out in a different thread.)
>>>>>
>>>>>
>>>>> So, what would be better, in my opinion. I start out from what I 
>>>>> think
>>>> are
>>>>> the common uses cases, in decreasing order of frequency. Goal is 
>>>>> to
>> make
>>>>> those less error prone for the users, and simpler to express.
>>>>>
>>>>> USE CASE 1
>>>>>
>>>>> You have exactly 1 input documents, which is therefore simply 
>>>>> "the"
>>>>> document in the mind of the user. This is probably the typical use
>> case,
>>>>> but at least the use case users typically start out from when 
>>>>> starting
>>>> the
>>>>> work.
>>>>>
>>>>>   freemarker-cli
>>>>>       -t access-report.ftl
>>>>>       somewhere/foo-access-log.csv
>>>>>
>>>>> Then `Documents.get(0)` is not very fitting. Most importantly it's
>> error
>>>>> prone, because if the user passed in more than 1 documents (can 
>>>>> even
>>>> happen
>>>>> totally accidentally, like if the user was lazy and used a 
>>>>> wildcard
>> that
>>>>> the shell exploded), the template will silently ignore the rest of 
>>>>> the
>>>>> documents, and the singe document processed will be practically 
>>>>> picked
>>>>> randomly. The user might won't notice that and submits a bad 
>>>>> report or
>>>> such.
>>>>>
>>>>> I think that in this use case the document should be simply 
>>>>> referred as
>>>>> `Document` in the template. When you have multiple documents 
>>>>> there,
>>>>> referring to `Document` should be an error, saying that the 
>>>>> template
>> was
>>>>> made to process a single document only.
>>>>>
>>>>>
>>>>> USE CASE 2
>>>>>
>>>>> You have multiple input documents, but each has different role
>> (different
>>>>> schema, maybe different file type). Like, you pass in users.csv 
>>>>> and
>>>>> groups.csv. Each has difference schema, and so you want to access 
>>>>> them
>>>>> differently, but in the same template.
>>>>>
>>>>>   freemarker-cli
>>>>>       [...]
>>>>>       --named-document users somewhere/foo-users.csv
>>>>>       --named-document groups somewhere/foo-groups.csv
>>>>>
>>>>> Then in the template you could refer to them as:
>> `NamedDocuments.users`,
>>>>> and `NamedDocuments.groups`.
>>>>>
>>>>> Use Case 1, and 2 can be unified into a coherent concept, where
>>>> `Document`
>>>>> is just a shorthand for `NamedDocuments.main`. It's called "main"
>> because
>>>>> that's "the" document the template is about, but then you have to 
>>>>> added
>>>>> some helper documents, with symbolic names representing their 
>>>>> role.
>>>>>
>>>>>   freemarker-cli
>>>>>       -t access-report.ftl
>>>>>       --document-name=main somewhere/foo-access-log.csv
>>>>>       --document-name=users somewhere/foo-users.csv
>>>>>       --document-name=groups somewhere/foo-groups.csv
>>>>>
>>>>> Here, `Document` still works in the template, and it refers to
>>>>> `somewhere/foo-access-log.csv`. (While omitting 
>>>>> --document-name=main
>>>> above
>>>>> would be cleaner, I couldn't figure out how to do that with 
>>>>> Picocli.
>>>>> Anyway, for now the point is the concept, which is not specific to
>> CLI.)
>>>>>
>>>>>
>>>>> USE CASE 3
>>>>>
>>>>> Here you have several of the same kind of documents. That has a 
>>>>> more
>>>>> generic sub-use-case, when you have explicitly named documents 
>>>>> (like
>>>>> "users" above), and for some you expect multiple input files.
>>>>>
>>>>>   freemarker-cli
>>>>>       -t access-report.ftl
>>>>>       --document-name=main somewhere/foo-access-log.csv
>>>>> somewhere/bar-access-log.csv
>>>>>       --document-name=users somewhere/foo-users.csv
>>>>> somewhere/bar-users.csv
>>>>>       --document-name=groups somewhere/global-groups.csv
>>>>>
>>>>> The template must to be written with this use case in mind, as now 
>>>>> it
>> has
>>>>> #list some of the documents. (I think in practice you hardly ever 
>>>>> want
>> to
>>>>> get a document by hard coded index. Either you don't know how many
>>>>> documents you have, so you can't use hard coded indexes, or you 
>>>>> do, and
>>>>> each index has a specific meaning, but then you should name the
>> documents
>>>>> instead, as using indexes is error prone, and hard to read.)
>>>>> Accessing that list of documents in the template, maybe could be 
>>>>> done
>>>> like
>>>>> this:
>>>>> - For the "main" documents: `DocumentList`
>>>>> - For explicitly named documents, like "users":
>>>> `NamedDocumentLists.users`
>>>>>
>>>>>
>>>>> SUMMING UP
>>>>>
>>>>> To unify all 3 use cases into a coherent concept:
>>>>> - `NamedDocumentLists.<name>` is the most generic form, and while 
>>>>> you
>> can
>>>>> achieve everything with it, using it requires your template to 
>>>>> handle
>> the
>>>>> most generic case too. So, I think it would be rarely used.
>>>>> - `DocumentList` is just a shorthand for 
>>>>> `NamedDocumentLists.main`.
>> It's
>>>>> used if you only have one kind of documents (single format and 
>>>>> schema),
>>>> but
>>>>> potentially multiple of them.
>>>>> - `NamedDocuments.<name>` expresses that you expect exactly 1 
>>>>> document
>> of
>>>>> the given name.
>>>>> - `Document` is just a shorthand for `NamedDocuments.main`. This 
>>>>> is for
>>>> the
>>>>> most natural/frequent use case.
>>>>>
>>>>> That's 4 possible ways of accessing your documents, which is a
>> trade-off
>>>>> for the sake of these:
>>>>> - Catching CLI (or Maven, etc.) input where the template output 
>>>>> likely
>>>> will
>>>>> be wrong. That's only possible if the user can communicate its 
>>>>> intent
>> in
>>>>> the template.
>>>>> - Users don't need to deal with concepts that are irrelevant in 
>>>>> their
>>>>> concrete use case. Just start with the trivial, `Document`, and 
>>>>> later
>> if
>>>>> the need arises, generalize to named documents, document lists, or
>> both.
>>>>>
>>>>>
>>>>> What do guys think?
>>>>
>>>>
>>
>>

Re: freemarker-generator: Improving the input documents concept

Posted by Daniel Dekany <dd...@apache.org>.

A few quick thoughts on that:

   - We should replace the "document" term with something more speaking. It
   doesn't tell that it's some kind of input. Also, most of these inputs
   aren't something that people typically call documents. Like a csv file, or
   a database table, which is not even a file (OK we don't support such thing
   at the moment). I think, maybe "data source" is a safe enough term. (It
   also rhymes with data model.)
   - You have separate "template" and "document" "mode", that applies to a
   whole run. I think such specialization won't be helpful. We could just say,
   on the conceptual level at lest, that we need a set of "outputs
   generators". An output generator is an object (in the API) that specifies a
   template, a data-model (where the data-model is possibly populated with
   "documents"), and an output "sink" (a file path, or stdout), and can
   generate the output itself. A practical way of defining the output
   generators in a CLI application is via a bunch of files, each defining an
   output generator. Some of those files is maybe a template (that you can
   even detect from the file extension), or a data file that we currently call
   a "document". They could freely mix inside the same run. I have also met
   use case when you have a single table (single "document"), and each record
   in it yields an output file. That can also be described in some file
   format, or really in any other way, like directly in command line argument,
   via API, etc.
   - You have multiple documents without associated symbolical name in some
   examples. Templates can't identify those then in a well maintainable way.
   The actual file name is often not a good identifier, can change over time,
   and you might don't even have good control over it, like you already
   receive it as a parameter from somewhere else, or someone moves/renames
   that files that you need to read. Index is also not very good, but I have
   written about that earlier.


On Wed, Feb 26, 2020 at 9:33 AM Siegfried Goeschl <
siegfried.goeschl@gmail.com> wrote:

> Hi folks,
>
> still wrapping my side around but assembled some thoughts here -
> https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449
>
> Thanks in advance,
>
> Siegfried Goeschl
>
>
>
> > On 23 Feb 2020, at 23:14, Daniel Dekany <dd...@apache.org> wrote:
> >
> > What you are describing is more like the angle that FMPP took initially,
> > where templates drive things, they generate the output for themselves
> (even
> > multiple output files if they wish). By default output files name (and
> > relative path) is deduced from template name. There was also a global
> > data-model, built in a configuration file (or equally, built via command
> > line arguments, or both mixed), from which templates get whatever data
> they
> > are interested in. Take a look at the figures here:
> > http://fmpp.sourceforge.net/qtour.html. Later, this concept was
> generalized
> > a bit more, because you could add XML files at the same place where you
> > have the templates, and then you could associate transform templates to
> the
> > XML files (based on path pattern and/or the XML document element). Now
> > that's like what freemarker-generator had initially (data files drive
> > output, and the template is there to transform it).
> >
> > So I think the generic mental model would like this:
> >
> >   1. You got files that drive the process, let's call them *generator
> >   files* for now. Usually, each generator file yields an output file (but
> >   maybe even multiple output files, as you might saw in the last figure).
> >   These generator files can be of many types, like XML, JSON, XLSX (as
> in the
> >   original freemarker-generator), and even templates (as is the norm in
> >   FMPP). If the file is not a template, then you got a set of transformer
> >   templates (-t CLI option) in a separate directory, which can be
> associated
> >   with the generator files base on name patterns, and even based on
> content
> >   (schema usually). If the generator file is a template (so that's a
> >   positional @Parameter CLI argument that happens to be an *.ftl, and is
> not
> >   a template file specified after the "-t" option), then you just
> >   Template.process(...) it, and it prints what the output will be.
> >   2. You also have a set of variables, the global data-model, that
> >   contains commonly useful stuff, like what you now call parameters (CLI
> >   -Pname=value), but also maybe data loaded from JSON, XML, etc.. Those
> data
> >   files aren't "generator files". Templates just use them if they need
> them.
> >   An important thing here is to reuse the same mechanism to read and
> parse
> >   those data files, which was used in templates when transforming
> generator
> >   files. So we need a common format for specifying how to load data
> files.
> >   That's maybe just FTL that #assigns to the variables, or maybe more
> >   declarative format.
> >
> > What I have described in the original post here was a less generic form
> of
> > this, as I tried to be true with the original approach. I though the
> > proposal will be drastic enough as it is... :) There, the "main" document
> > is the "generator file" from point 1, the "-t" template is the transform
> > template for the "main" document, and the other named documents ("users",
> > "groups") is a poor man's shared data-model from point 2 (together with
> > with -PName=value).
> >
> > There's further somewhat confusing thing to get right with the
> > list-of-documents (`DocuentList`, `NamedDocumentLists`) thing though. In
> > the model above, as per point 1, if you list multiple data files, each
> will
> > generate a separate output file. So, if you need take in a list of files
> to
> > transform it to a single output file (or at least with a single transform
> > template execution), then you have to be explicit about that, as that's
> not
> > the default behavior anymore. But it's still absolutely possible. Imagine
> > it as a "list of XLSX-es" is itself like a file format. You need some CLI
> > (and Maven config, etc.) syntax to express that, but that shouldn't be a
> > big deal.
> >
> >
> >
> > On Sun, Feb 23, 2020 at 9:43 PM Siegfried Goeschl <
> > siegfried.goeschl@gmail.com> wrote:
> >
> >> Hi Daniel,
> >>
> >> Good timing - I was looking at a similar problem from different angle
> >> yesterday (see below)
> >>
> >> Don't have enough time to answer your email in detail now - will do that
> >> tomorrow evening
> >>
> >> Thanks in advance,
> >>
> >> Siegfried Goeschl
> >>
> >>
> >> ===. START
> >> # FreeMarker CLI Improvement
> >> ## Support Of Multiple Template Files
> >> Currently we support the following combinations
> >>
> >> * Single template and no data files
> >> * Single template and one or more data files
> >>
> >> But we can not support the following use case which is quite typical in
> >> the cloud
> >>
> >> __Convert multiple templates with a single data file, e.g copying a
> >> directory of configuration files using a JSON configuration file__
> >>
> >> ## Implementation notes
> >> * When we copy a directory we can remove the `ftl`extension on the fly
> >> * We might need an `exclude` filter for the copy operation
> >> * Initially resolve to a list of template files and process one after
> >> another
> >> * Need to calculate the output file location and extension
> >> * We need to rename the existing command line parameters  (see below)
> >> * Do we need multiple include and exclude filter?
> >> * Do we need file versus directory filters?
> >>
> >> ### Command Line Options
> >> ```
> >> --input-encoding : Encoding of the documents
> >> --output-encoding : Encoding of the rendered template
> >> --template-encoding : Encoding of the template
> >> --output : Output file or directory
> >> --include-document : Include pattern for documents
> >> --exclude-document : Exclude pattern for documents
> >> --include-template: Include pattern for templates
> >> --exclude-template : Exclude pattern for templates
> >> ```
> >>
> >> ### Command Line Examples
> >> ```text
> >> # Copy all FTL templates found in "ext/config" to the "/config"
> directory
> >> using the data from "config.json"
> >>> freemarker-cli -t ./ext/config --include-template *.ftl --o /config
> >> config.json
> >>> freemarker-cli --template ./ext/config --include-template *.ftl
> --output
> >> /config config.json
> >>
> >> # Bascically the same using a named document "configuration"
> >> # It might make sense to expose "conf" directly in the FreeMarker data
> >> model
> >> # It might make sens to allow URIs for loading documents
> >>> freemarker-cli -t ./ext/config/*.ftl -o /config -d
> >> configuration=config.json
> >>> freemarker-cli --template ./ext/config --include-template *.ftl
> --output
> >> /config --document configuration=config.json
> >>> freemarker-cli --template ./ext/config --include-template *.ftl
> --output
> >> /config --document configuration=file:///config.json
> >>
> >> # Bascically the same using an environment variable as named document
> >>> freemarker-cli -t ./ext/config --include-template *.ftl -o /config -d
> >> configuration=env:///CONFIGURATION
> >>> freemarker-cli --template ./ext/config --include-template *.ftl
> --output
> >> /config --document configuration=env:///CONFIGURATION
> >> ```
> >> === END
> >>
> >>
> >>> On 23.02.2020, at 16:37, Daniel Dekany <dd...@apache.org> wrote:
> >>>
> >>> Input documents is a fundamental concept in freemarker-generator, so we
> >>> should think about that more, and probably refine/rework how it's done.
> >>>
> >>> Currently it works like this, with CLI at least.
> >>>
> >>>   freemarker-cli
> >>>       -t access-report.ftl
> >>>       somewhere/foo-access-log.csv
> >>>
> >>> Then in access-report.ftl you have to do something like this:
> >>>
> >>>   <#assign doc = Documents.get(0)>
> >>>   ... process doc here
> >>>
> >>> (The more idiomatic Documents[0] won't work. Actually, that lead to a
> >> funny
> >>> chain of coincidences: It returned the string "D", then
> >> CSVTool.parse(...)
> >>> happily parsed that to a table with the single column "D", and 0 rows,
> >> and
> >>> as there were 0 rows, the template didn't run into an error because
> >>> row.myExpectedColumn refers to a missing column either, so the process
> >>> finished with success. (: Pretty unlucky for sure. The root was
> >>> unintentionally breaking a FreeMarker idiom though; eventually we will
> >> have
> >>> to work on those too, but, different topic.)
> >>>
> >>> However, actually multiple input documents can be passed in:
> >>>
> >>>   freemarker-cli
> >>>       -t access-report.ftl
> >>>       somewhere/foo-access-log.csv
> >>>       somewhere/bar-access-log.csv
> >>>
> >>> Above template will still work, though then you ignored all but the
> first
> >>> document. So if you expect any number of input documents, you probably
> >> will
> >>> have to do this:
> >>>
> >>>   <#list Documents.list as doc>
> >>>         ... process doc here
> >>>   </#list>
> >>>
> >>> (The more idiomatic <#list Documents as doc> won't work; but again,
> those
> >>> we will work out in a different thread.)
> >>>
> >>>
> >>> So, what would be better, in my opinion. I start out from what I think
> >> are
> >>> the common uses cases, in decreasing order of frequency. Goal is to
> make
> >>> those less error prone for the users, and simpler to express.
> >>>
> >>> USE CASE 1
> >>>
> >>> You have exactly 1 input documents, which is therefore simply "the"
> >>> document in the mind of the user. This is probably the typical use
> case,
> >>> but at least the use case users typically start out from when starting
> >> the
> >>> work.
> >>>
> >>>   freemarker-cli
> >>>       -t access-report.ftl
> >>>       somewhere/foo-access-log.csv
> >>>
> >>> Then `Documents.get(0)` is not very fitting. Most importantly it's
> error
> >>> prone, because if the user passed in more than 1 documents (can even
> >> happen
> >>> totally accidentally, like if the user was lazy and used a wildcard
> that
> >>> the shell exploded), the template will silently ignore the rest of the
> >>> documents, and the singe document processed will be practically picked
> >>> randomly. The user might won't notice that and submits a bad report or
> >> such.
> >>>
> >>> I think that in this use case the document should be simply referred as
> >>> `Document` in the template. When you have multiple documents there,
> >>> referring to `Document` should be an error, saying that the template
> was
> >>> made to process a single document only.
> >>>
> >>>
> >>> USE CASE 2
> >>>
> >>> You have multiple input documents, but each has different role
> (different
> >>> schema, maybe different file type). Like, you pass in users.csv and
> >>> groups.csv. Each has difference schema, and so you want to access them
> >>> differently, but in the same template.
> >>>
> >>>   freemarker-cli
> >>>       [...]
> >>>       --named-document users somewhere/foo-users.csv
> >>>       --named-document groups somewhere/foo-groups.csv
> >>>
> >>> Then in the template you could refer to them as:
> `NamedDocuments.users`,
> >>> and `NamedDocuments.groups`.
> >>>
> >>> Use Case 1, and 2 can be unified into a coherent concept, where
> >> `Document`
> >>> is just a shorthand for `NamedDocuments.main`. It's called "main"
> because
> >>> that's "the" document the template is about, but then you have to added
> >>> some helper documents, with symbolic names representing their role.
> >>>
> >>>   freemarker-cli
> >>>       -t access-report.ftl
> >>>       --document-name=main somewhere/foo-access-log.csv
> >>>       --document-name=users somewhere/foo-users.csv
> >>>       --document-name=groups somewhere/foo-groups.csv
> >>>
> >>> Here, `Document` still works in the template, and it refers to
> >>> `somewhere/foo-access-log.csv`. (While omitting --document-name=main
> >> above
> >>> would be cleaner, I couldn't figure out how to do that with Picocli.
> >>> Anyway, for now the point is the concept, which is not specific to
> CLI.)
> >>>
> >>>
> >>> USE CASE 3
> >>>
> >>> Here you have several of the same kind of documents. That has a more
> >>> generic sub-use-case, when you have explicitly named documents (like
> >>> "users" above), and for some you expect multiple input files.
> >>>
> >>>   freemarker-cli
> >>>       -t access-report.ftl
> >>>       --document-name=main somewhere/foo-access-log.csv
> >>> somewhere/bar-access-log.csv
> >>>       --document-name=users somewhere/foo-users.csv
> >>> somewhere/bar-users.csv
> >>>       --document-name=groups somewhere/global-groups.csv
> >>>
> >>> The template must to be written with this use case in mind, as now it
> has
> >>> #list some of the documents. (I think in practice you hardly ever want
> to
> >>> get a document by hard coded index. Either you don't know how many
> >>> documents you have, so you can't use hard coded indexes, or you do, and
> >>> each index has a specific meaning, but then you should name the
> documents
> >>> instead, as using indexes is error prone, and hard to read.)
> >>> Accessing that list of documents in the template, maybe could be done
> >> like
> >>> this:
> >>> - For the "main" documents: `DocumentList`
> >>> - For explicitly named documents, like "users":
> >> `NamedDocumentLists.users`
> >>>
> >>>
> >>> SUMMING UP
> >>>
> >>> To unify all 3 use cases into a coherent concept:
> >>> - `NamedDocumentLists.<name>` is the most generic form, and while you
> can
> >>> achieve everything with it, using it requires your template to handle
> the
> >>> most generic case too. So, I think it would be rarely used.
> >>> - `DocumentList` is just a shorthand for `NamedDocumentLists.main`.
> It's
> >>> used if you only have one kind of documents (single format and schema),
> >> but
> >>> potentially multiple of them.
> >>> - `NamedDocuments.<name>` expresses that you expect exactly 1 document
> of
> >>> the given name.
> >>> - `Document` is just a shorthand for `NamedDocuments.main`. This is for
> >> the
> >>> most natural/frequent use case.
> >>>
> >>> That's 4 possible ways of accessing your documents, which is a
> trade-off
> >>> for the sake of these:
> >>> - Catching CLI (or Maven, etc.) input where the template output likely
> >> will
> >>> be wrong. That's only possible if the user can communicate its intent
> in
> >>> the template.
> >>> - Users don't need to deal with concepts that are irrelevant in their
> >>> concrete use case. Just start with the trivial, `Document`, and later
> if
> >>> the need arises, generalize to named documents, document lists, or
> both.
> >>>
> >>>
> >>> What do guys think?
> >>
> >>
>
>

Re: freemarker-generator: Improving the input documents concept

Posted by Siegfried Goeschl <si...@gmail.com>.

Hi folks,

still wrapping my side around but assembled some thoughts here - https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449

Thanks in advance, 

Siegfried Goeschl



> On 23 Feb 2020, at 23:14, Daniel Dekany <dd...@apache.org> wrote:
> 
> What you are describing is more like the angle that FMPP took initially,
> where templates drive things, they generate the output for themselves (even
> multiple output files if they wish). By default output files name (and
> relative path) is deduced from template name. There was also a global
> data-model, built in a configuration file (or equally, built via command
> line arguments, or both mixed), from which templates get whatever data they
> are interested in. Take a look at the figures here:
> http://fmpp.sourceforge.net/qtour.html. Later, this concept was generalized
> a bit more, because you could add XML files at the same place where you
> have the templates, and then you could associate transform templates to the
> XML files (based on path pattern and/or the XML document element). Now
> that's like what freemarker-generator had initially (data files drive
> output, and the template is there to transform it).
> 
> So I think the generic mental model would like this:
> 
>   1. You got files that drive the process, let's call them *generator
>   files* for now. Usually, each generator file yields an output file (but
>   maybe even multiple output files, as you might saw in the last figure).
>   These generator files can be of many types, like XML, JSON, XLSX (as in the
>   original freemarker-generator), and even templates (as is the norm in
>   FMPP). If the file is not a template, then you got a set of transformer
>   templates (-t CLI option) in a separate directory, which can be associated
>   with the generator files base on name patterns, and even based on content
>   (schema usually). If the generator file is a template (so that's a
>   positional @Parameter CLI argument that happens to be an *.ftl, and is not
>   a template file specified after the "-t" option), then you just
>   Template.process(...) it, and it prints what the output will be.
>   2. You also have a set of variables, the global data-model, that
>   contains commonly useful stuff, like what you now call parameters (CLI
>   -Pname=value), but also maybe data loaded from JSON, XML, etc.. Those data
>   files aren't "generator files". Templates just use them if they need them.
>   An important thing here is to reuse the same mechanism to read and parse
>   those data files, which was used in templates when transforming generator
>   files. So we need a common format for specifying how to load data files.
>   That's maybe just FTL that #assigns to the variables, or maybe more
>   declarative format.
> 
> What I have described in the original post here was a less generic form of
> this, as I tried to be true with the original approach. I though the
> proposal will be drastic enough as it is... :) There, the "main" document
> is the "generator file" from point 1, the "-t" template is the transform
> template for the "main" document, and the other named documents ("users",
> "groups") is a poor man's shared data-model from point 2 (together with
> with -PName=value).
> 
> There's further somewhat confusing thing to get right with the
> list-of-documents (`DocuentList`, `NamedDocumentLists`) thing though. In
> the model above, as per point 1, if you list multiple data files, each will
> generate a separate output file. So, if you need take in a list of files to
> transform it to a single output file (or at least with a single transform
> template execution), then you have to be explicit about that, as that's not
> the default behavior anymore. But it's still absolutely possible. Imagine
> it as a "list of XLSX-es" is itself like a file format. You need some CLI
> (and Maven config, etc.) syntax to express that, but that shouldn't be a
> big deal.
> 
> 
> 
> On Sun, Feb 23, 2020 at 9:43 PM Siegfried Goeschl <
> siegfried.goeschl@gmail.com> wrote:
> 
>> Hi Daniel,
>> 
>> Good timing - I was looking at a similar problem from different angle
>> yesterday (see below)
>> 
>> Don't have enough time to answer your email in detail now - will do that
>> tomorrow evening
>> 
>> Thanks in advance,
>> 
>> Siegfried Goeschl
>> 
>> 
>> ===. START
>> # FreeMarker CLI Improvement
>> ## Support Of Multiple Template Files
>> Currently we support the following combinations
>> 
>> * Single template and no data files
>> * Single template and one or more data files
>> 
>> But we can not support the following use case which is quite typical in
>> the cloud
>> 
>> __Convert multiple templates with a single data file, e.g copying a
>> directory of configuration files using a JSON configuration file__
>> 
>> ## Implementation notes
>> * When we copy a directory we can remove the `ftl`extension on the fly
>> * We might need an `exclude` filter for the copy operation
>> * Initially resolve to a list of template files and process one after
>> another
>> * Need to calculate the output file location and extension
>> * We need to rename the existing command line parameters  (see below)
>> * Do we need multiple include and exclude filter?
>> * Do we need file versus directory filters?
>> 
>> ### Command Line Options
>> ```
>> --input-encoding : Encoding of the documents
>> --output-encoding : Encoding of the rendered template
>> --template-encoding : Encoding of the template
>> --output : Output file or directory
>> --include-document : Include pattern for documents
>> --exclude-document : Exclude pattern for documents
>> --include-template: Include pattern for templates
>> --exclude-template : Exclude pattern for templates
>> ```
>> 
>> ### Command Line Examples
>> ```text
>> # Copy all FTL templates found in "ext/config" to the "/config" directory
>> using the data from "config.json"
>>> freemarker-cli -t ./ext/config --include-template *.ftl --o /config
>> config.json
>>> freemarker-cli --template ./ext/config --include-template *.ftl --output
>> /config config.json
>> 
>> # Bascically the same using a named document "configuration"
>> # It might make sense to expose "conf" directly in the FreeMarker data
>> model
>> # It might make sens to allow URIs for loading documents
>>> freemarker-cli -t ./ext/config/*.ftl -o /config -d
>> configuration=config.json
>>> freemarker-cli --template ./ext/config --include-template *.ftl --output
>> /config --document configuration=config.json
>>> freemarker-cli --template ./ext/config --include-template *.ftl --output
>> /config --document configuration=file:///config.json
>> 
>> # Bascically the same using an environment variable as named document
>>> freemarker-cli -t ./ext/config --include-template *.ftl -o /config -d
>> configuration=env:///CONFIGURATION
>>> freemarker-cli --template ./ext/config --include-template *.ftl --output
>> /config --document configuration=env:///CONFIGURATION
>> ```
>> === END
>> 
>> 
>>> On 23.02.2020, at 16:37, Daniel Dekany <dd...@apache.org> wrote:
>>> 
>>> Input documents is a fundamental concept in freemarker-generator, so we
>>> should think about that more, and probably refine/rework how it's done.
>>> 
>>> Currently it works like this, with CLI at least.
>>> 
>>>   freemarker-cli
>>>       -t access-report.ftl
>>>       somewhere/foo-access-log.csv
>>> 
>>> Then in access-report.ftl you have to do something like this:
>>> 
>>>   <#assign doc = Documents.get(0)>
>>>   ... process doc here
>>> 
>>> (The more idiomatic Documents[0] won't work. Actually, that lead to a
>> funny
>>> chain of coincidences: It returned the string "D", then
>> CSVTool.parse(...)
>>> happily parsed that to a table with the single column "D", and 0 rows,
>> and
>>> as there were 0 rows, the template didn't run into an error because
>>> row.myExpectedColumn refers to a missing column either, so the process
>>> finished with success. (: Pretty unlucky for sure. The root was
>>> unintentionally breaking a FreeMarker idiom though; eventually we will
>> have
>>> to work on those too, but, different topic.)
>>> 
>>> However, actually multiple input documents can be passed in:
>>> 
>>>   freemarker-cli
>>>       -t access-report.ftl
>>>       somewhere/foo-access-log.csv
>>>       somewhere/bar-access-log.csv
>>> 
>>> Above template will still work, though then you ignored all but the first
>>> document. So if you expect any number of input documents, you probably
>> will
>>> have to do this:
>>> 
>>>   <#list Documents.list as doc>
>>>         ... process doc here
>>>   </#list>
>>> 
>>> (The more idiomatic <#list Documents as doc> won't work; but again, those
>>> we will work out in a different thread.)
>>> 
>>> 
>>> So, what would be better, in my opinion. I start out from what I think
>> are
>>> the common uses cases, in decreasing order of frequency. Goal is to make
>>> those less error prone for the users, and simpler to express.
>>> 
>>> USE CASE 1
>>> 
>>> You have exactly 1 input documents, which is therefore simply "the"
>>> document in the mind of the user. This is probably the typical use case,
>>> but at least the use case users typically start out from when starting
>> the
>>> work.
>>> 
>>>   freemarker-cli
>>>       -t access-report.ftl
>>>       somewhere/foo-access-log.csv
>>> 
>>> Then `Documents.get(0)` is not very fitting. Most importantly it's error
>>> prone, because if the user passed in more than 1 documents (can even
>> happen
>>> totally accidentally, like if the user was lazy and used a wildcard that
>>> the shell exploded), the template will silently ignore the rest of the
>>> documents, and the singe document processed will be practically picked
>>> randomly. The user might won't notice that and submits a bad report or
>> such.
>>> 
>>> I think that in this use case the document should be simply referred as
>>> `Document` in the template. When you have multiple documents there,
>>> referring to `Document` should be an error, saying that the template was
>>> made to process a single document only.
>>> 
>>> 
>>> USE CASE 2
>>> 
>>> You have multiple input documents, but each has different role (different
>>> schema, maybe different file type). Like, you pass in users.csv and
>>> groups.csv. Each has difference schema, and so you want to access them
>>> differently, but in the same template.
>>> 
>>>   freemarker-cli
>>>       [...]
>>>       --named-document users somewhere/foo-users.csv
>>>       --named-document groups somewhere/foo-groups.csv
>>> 
>>> Then in the template you could refer to them as: `NamedDocuments.users`,
>>> and `NamedDocuments.groups`.
>>> 
>>> Use Case 1, and 2 can be unified into a coherent concept, where
>> `Document`
>>> is just a shorthand for `NamedDocuments.main`. It's called "main" because
>>> that's "the" document the template is about, but then you have to added
>>> some helper documents, with symbolic names representing their role.
>>> 
>>>   freemarker-cli
>>>       -t access-report.ftl
>>>       --document-name=main somewhere/foo-access-log.csv
>>>       --document-name=users somewhere/foo-users.csv
>>>       --document-name=groups somewhere/foo-groups.csv
>>> 
>>> Here, `Document` still works in the template, and it refers to
>>> `somewhere/foo-access-log.csv`. (While omitting --document-name=main
>> above
>>> would be cleaner, I couldn't figure out how to do that with Picocli.
>>> Anyway, for now the point is the concept, which is not specific to CLI.)
>>> 
>>> 
>>> USE CASE 3
>>> 
>>> Here you have several of the same kind of documents. That has a more
>>> generic sub-use-case, when you have explicitly named documents (like
>>> "users" above), and for some you expect multiple input files.
>>> 
>>>   freemarker-cli
>>>       -t access-report.ftl
>>>       --document-name=main somewhere/foo-access-log.csv
>>> somewhere/bar-access-log.csv
>>>       --document-name=users somewhere/foo-users.csv
>>> somewhere/bar-users.csv
>>>       --document-name=groups somewhere/global-groups.csv
>>> 
>>> The template must to be written with this use case in mind, as now it has
>>> #list some of the documents. (I think in practice you hardly ever want to
>>> get a document by hard coded index. Either you don't know how many
>>> documents you have, so you can't use hard coded indexes, or you do, and
>>> each index has a specific meaning, but then you should name the documents
>>> instead, as using indexes is error prone, and hard to read.)
>>> Accessing that list of documents in the template, maybe could be done
>> like
>>> this:
>>> - For the "main" documents: `DocumentList`
>>> - For explicitly named documents, like "users":
>> `NamedDocumentLists.users`
>>> 
>>> 
>>> SUMMING UP
>>> 
>>> To unify all 3 use cases into a coherent concept:
>>> - `NamedDocumentLists.<name>` is the most generic form, and while you can
>>> achieve everything with it, using it requires your template to handle the
>>> most generic case too. So, I think it would be rarely used.
>>> - `DocumentList` is just a shorthand for `NamedDocumentLists.main`. It's
>>> used if you only have one kind of documents (single format and schema),
>> but
>>> potentially multiple of them.
>>> - `NamedDocuments.<name>` expresses that you expect exactly 1 document of
>>> the given name.
>>> - `Document` is just a shorthand for `NamedDocuments.main`. This is for
>> the
>>> most natural/frequent use case.
>>> 
>>> That's 4 possible ways of accessing your documents, which is a trade-off
>>> for the sake of these:
>>> - Catching CLI (or Maven, etc.) input where the template output likely
>> will
>>> be wrong. That's only possible if the user can communicate its intent in
>>> the template.
>>> - Users don't need to deal with concepts that are irrelevant in their
>>> concrete use case. Just start with the trivial, `Document`, and later if
>>> the need arises, generalize to named documents, document lists, or both.
>>> 
>>> 
>>> What do guys think?
>> 
>>

Re: freemarker-generator: Improving the input documents concept

Posted by Daniel Dekany <dd...@apache.org>.

What you are describing is more like the angle that FMPP took initially,
where templates drive things, they generate the output for themselves (even
multiple output files if they wish). By default output files name (and
relative path) is deduced from template name. There was also a global
data-model, built in a configuration file (or equally, built via command
line arguments, or both mixed), from which templates get whatever data they
are interested in. Take a look at the figures here:
http://fmpp.sourceforge.net/qtour.html. Later, this concept was generalized
a bit more, because you could add XML files at the same place where you
have the templates, and then you could associate transform templates to the
XML files (based on path pattern and/or the XML document element). Now
that's like what freemarker-generator had initially (data files drive
output, and the template is there to transform it).

So I think the generic mental model would like this:

   1. You got files that drive the process, let's call them *generator
   files* for now. Usually, each generator file yields an output file (but
   maybe even multiple output files, as you might saw in the last figure).
   These generator files can be of many types, like XML, JSON, XLSX (as in the
   original freemarker-generator), and even templates (as is the norm in
   FMPP). If the file is not a template, then you got a set of transformer
   templates (-t CLI option) in a separate directory, which can be associated
   with the generator files base on name patterns, and even based on content
   (schema usually). If the generator file is a template (so that's a
   positional @Parameter CLI argument that happens to be an *.ftl, and is not
   a template file specified after the "-t" option), then you just
   Template.process(...) it, and it prints what the output will be.
   2. You also have a set of variables, the global data-model, that
   contains commonly useful stuff, like what you now call parameters (CLI
   -Pname=value), but also maybe data loaded from JSON, XML, etc.. Those data
   files aren't "generator files". Templates just use them if they need them.
   An important thing here is to reuse the same mechanism to read and parse
   those data files, which was used in templates when transforming generator
   files. So we need a common format for specifying how to load data files.
   That's maybe just FTL that #assigns to the variables, or maybe more
   declarative format.

What I have described in the original post here was a less generic form of
this, as I tried to be true with the original approach. I though the
proposal will be drastic enough as it is... :) There, the "main" document
is the "generator file" from point 1, the "-t" template is the transform
template for the "main" document, and the other named documents ("users",
"groups") is a poor man's shared data-model from point 2 (together with
with -PName=value).

There's further somewhat confusing thing to get right with the
list-of-documents (`DocuentList`, `NamedDocumentLists`) thing though. In
the model above, as per point 1, if you list multiple data files, each will
generate a separate output file. So, if you need take in a list of files to
transform it to a single output file (or at least with a single transform
template execution), then you have to be explicit about that, as that's not
the default behavior anymore. But it's still absolutely possible. Imagine
it as a "list of XLSX-es" is itself like a file format. You need some CLI
(and Maven config, etc.) syntax to express that, but that shouldn't be a
big deal.



On Sun, Feb 23, 2020 at 9:43 PM Siegfried Goeschl <
siegfried.goeschl@gmail.com> wrote:

> Hi Daniel,
>
> Good timing - I was looking at a similar problem from different angle
> yesterday (see below)
>
> Don't have enough time to answer your email in detail now - will do that
> tomorrow evening
>
> Thanks in advance,
>
> Siegfried Goeschl
>
>
> ===. START
> # FreeMarker CLI Improvement
> ## Support Of Multiple Template Files
> Currently we support the following combinations
>
> * Single template and no data files
> * Single template and one or more data files
>
> But we can not support the following use case which is quite typical in
> the cloud
>
> __Convert multiple templates with a single data file, e.g copying a
> directory of configuration files using a JSON configuration file__
>
> ## Implementation notes
> * When we copy a directory we can remove the `ftl`extension on the fly
> * We might need an `exclude` filter for the copy operation
> * Initially resolve to a list of template files and process one after
> another
> * Need to calculate the output file location and extension
> * We need to rename the existing command line parameters  (see below)
> * Do we need multiple include and exclude filter?
> * Do we need file versus directory filters?
>
> ### Command Line Options
> ```
> --input-encoding : Encoding of the documents
> --output-encoding : Encoding of the rendered template
> --template-encoding : Encoding of the template
> --output : Output file or directory
> --include-document : Include pattern for documents
> --exclude-document : Exclude pattern for documents
> --include-template: Include pattern for templates
> --exclude-template : Exclude pattern for templates
> ```
>
> ### Command Line Examples
> ```text
> # Copy all FTL templates found in "ext/config" to the "/config" directory
> using the data from "config.json"
> > freemarker-cli -t ./ext/config --include-template *.ftl --o /config
> config.json
> > freemarker-cli --template ./ext/config --include-template *.ftl --output
> /config config.json
>
> # Bascically the same using a named document "configuration"
> # It might make sense to expose "conf" directly in the FreeMarker data
> model
> # It might make sens to allow URIs for loading documents
> > freemarker-cli -t ./ext/config/*.ftl -o /config -d
> configuration=config.json
> > freemarker-cli --template ./ext/config --include-template *.ftl --output
> /config --document configuration=config.json
> > freemarker-cli --template ./ext/config --include-template *.ftl --output
> /config --document configuration=file:///config.json
>
> # Bascically the same using an environment variable as named document
> > freemarker-cli -t ./ext/config --include-template *.ftl -o /config -d
> configuration=env:///CONFIGURATION
> > freemarker-cli --template ./ext/config --include-template *.ftl --output
> /config --document configuration=env:///CONFIGURATION
> ```
> === END
>
>
> > On 23.02.2020, at 16:37, Daniel Dekany <dd...@apache.org> wrote:
> >
> > Input documents is a fundamental concept in freemarker-generator, so we
> > should think about that more, and probably refine/rework how it's done.
> >
> > Currently it works like this, with CLI at least.
> >
> >    freemarker-cli
> >        -t access-report.ftl
> >        somewhere/foo-access-log.csv
> >
> > Then in access-report.ftl you have to do something like this:
> >
> >    <#assign doc = Documents.get(0)>
> >    ... process doc here
> >
> > (The more idiomatic Documents[0] won't work. Actually, that lead to a
> funny
> > chain of coincidences: It returned the string "D", then
> CSVTool.parse(...)
> > happily parsed that to a table with the single column "D", and 0 rows,
> and
> > as there were 0 rows, the template didn't run into an error because
> > row.myExpectedColumn refers to a missing column either, so the process
> > finished with success. (: Pretty unlucky for sure. The root was
> > unintentionally breaking a FreeMarker idiom though; eventually we will
> have
> > to work on those too, but, different topic.)
> >
> > However, actually multiple input documents can be passed in:
> >
> >    freemarker-cli
> >        -t access-report.ftl
> >        somewhere/foo-access-log.csv
> >        somewhere/bar-access-log.csv
> >
> > Above template will still work, though then you ignored all but the first
> > document. So if you expect any number of input documents, you probably
> will
> > have to do this:
> >
> >    <#list Documents.list as doc>
> >          ... process doc here
> >    </#list>
> >
> > (The more idiomatic <#list Documents as doc> won't work; but again, those
> > we will work out in a different thread.)
> >
> >
> > So, what would be better, in my opinion. I start out from what I think
> are
> > the common uses cases, in decreasing order of frequency. Goal is to make
> > those less error prone for the users, and simpler to express.
> >
> > USE CASE 1
> >
> > You have exactly 1 input documents, which is therefore simply "the"
> > document in the mind of the user. This is probably the typical use case,
> > but at least the use case users typically start out from when starting
> the
> > work.
> >
> >    freemarker-cli
> >        -t access-report.ftl
> >        somewhere/foo-access-log.csv
> >
> > Then `Documents.get(0)` is not very fitting. Most importantly it's error
> > prone, because if the user passed in more than 1 documents (can even
> happen
> > totally accidentally, like if the user was lazy and used a wildcard that
> > the shell exploded), the template will silently ignore the rest of the
> > documents, and the singe document processed will be practically picked
> > randomly. The user might won't notice that and submits a bad report or
> such.
> >
> > I think that in this use case the document should be simply referred as
> > `Document` in the template. When you have multiple documents there,
> > referring to `Document` should be an error, saying that the template was
> > made to process a single document only.
> >
> >
> > USE CASE 2
> >
> > You have multiple input documents, but each has different role (different
> > schema, maybe different file type). Like, you pass in users.csv and
> > groups.csv. Each has difference schema, and so you want to access them
> > differently, but in the same template.
> >
> >    freemarker-cli
> >        [...]
> >        --named-document users somewhere/foo-users.csv
> >        --named-document groups somewhere/foo-groups.csv
> >
> > Then in the template you could refer to them as: `NamedDocuments.users`,
> > and `NamedDocuments.groups`.
> >
> > Use Case 1, and 2 can be unified into a coherent concept, where
> `Document`
> > is just a shorthand for `NamedDocuments.main`. It's called "main" because
> > that's "the" document the template is about, but then you have to added
> > some helper documents, with symbolic names representing their role.
> >
> >    freemarker-cli
> >        -t access-report.ftl
> >        --document-name=main somewhere/foo-access-log.csv
> >        --document-name=users somewhere/foo-users.csv
> >        --document-name=groups somewhere/foo-groups.csv
> >
> > Here, `Document` still works in the template, and it refers to
> > `somewhere/foo-access-log.csv`. (While omitting --document-name=main
> above
> > would be cleaner, I couldn't figure out how to do that with Picocli.
> > Anyway, for now the point is the concept, which is not specific to CLI.)
> >
> >
> > USE CASE 3
> >
> > Here you have several of the same kind of documents. That has a more
> > generic sub-use-case, when you have explicitly named documents (like
> > "users" above), and for some you expect multiple input files.
> >
> >    freemarker-cli
> >        -t access-report.ftl
> >        --document-name=main somewhere/foo-access-log.csv
> > somewhere/bar-access-log.csv
> >        --document-name=users somewhere/foo-users.csv
> > somewhere/bar-users.csv
> >        --document-name=groups somewhere/global-groups.csv
> >
> > The template must to be written with this use case in mind, as now it has
> > #list some of the documents. (I think in practice you hardly ever want to
> > get a document by hard coded index. Either you don't know how many
> > documents you have, so you can't use hard coded indexes, or you do, and
> > each index has a specific meaning, but then you should name the documents
> > instead, as using indexes is error prone, and hard to read.)
> > Accessing that list of documents in the template, maybe could be done
> like
> > this:
> > - For the "main" documents: `DocumentList`
> > - For explicitly named documents, like "users":
> `NamedDocumentLists.users`
> >
> >
> > SUMMING UP
> >
> > To unify all 3 use cases into a coherent concept:
> > - `NamedDocumentLists.<name>` is the most generic form, and while you can
> > achieve everything with it, using it requires your template to handle the
> > most generic case too. So, I think it would be rarely used.
> > - `DocumentList` is just a shorthand for `NamedDocumentLists.main`. It's
> > used if you only have one kind of documents (single format and schema),
> but
> > potentially multiple of them.
> > - `NamedDocuments.<name>` expresses that you expect exactly 1 document of
> > the given name.
> > - `Document` is just a shorthand for `NamedDocuments.main`. This is for
> the
> > most natural/frequent use case.
> >
> > That's 4 possible ways of accessing your documents, which is a trade-off
> > for the sake of these:
> > - Catching CLI (or Maven, etc.) input where the template output likely
> will
> > be wrong. That's only possible if the user can communicate its intent in
> > the template.
> > - Users don't need to deal with concepts that are irrelevant in their
> > concrete use case. Just start with the trivial, `Document`, and later if
> > the need arises, generalize to named documents, document lists, or both.
> >
> >
> > What do guys think?
>
>

Re: freemarker-generator: Improving the input documents concept

Posted by Siegfried Goeschl <si...@gmail.com>.

Hi Daniel,

Good timing - I was looking at a similar problem from different angle yesterday (see below)

Don't have enough time to answer your email in detail now - will do that tomorrow evening

Thanks in advance, 

Siegfried Goeschl


===. START
# FreeMarker CLI Improvement
## Support Of Multiple Template Files
Currently we support the following combinations

* Single template and no data files
* Single template and one or more data files

But we can not support the following use case which is quite typical in the cloud

__Convert multiple templates with a single data file, e.g copying a directory of configuration files using a JSON configuration file__

## Implementation notes
* When we copy a directory we can remove the `ftl`extension on the fly
* We might need an `exclude` filter for the copy operation
* Initially resolve to a list of template files and process one after another
* Need to calculate the output file location and extension
* We need to rename the existing command line parameters  (see below)
* Do we need multiple include and exclude filter?
* Do we need file versus directory filters?

### Command Line Options
```
--input-encoding : Encoding of the documents
--output-encoding : Encoding of the rendered template
--template-encoding : Encoding of the template
--output : Output file or directory
--include-document : Include pattern for documents
--exclude-document : Exclude pattern for documents
--include-template: Include pattern for templates
--exclude-template : Exclude pattern for templates
```

### Command Line Examples
```text
# Copy all FTL templates found in "ext/config" to the "/config" directory using the data from "config.json"
> freemarker-cli -t ./ext/config --include-template *.ftl --o /config config.json
> freemarker-cli --template ./ext/config --include-template *.ftl --output /config config.json

# Bascically the same using a named document "configuration"
# It might make sense to expose "conf" directly in the FreeMarker data model
# It might make sens to allow URIs for loading documents
> freemarker-cli -t ./ext/config/*.ftl -o /config -d configuration=config.json
> freemarker-cli --template ./ext/config --include-template *.ftl --output /config --document configuration=config.json
> freemarker-cli --template ./ext/config --include-template *.ftl --output /config --document configuration=file:///config.json

# Bascically the same using an environment variable as named document
> freemarker-cli -t ./ext/config --include-template *.ftl -o /config -d configuration=env:///CONFIGURATION
> freemarker-cli --template ./ext/config --include-template *.ftl --output /config --document configuration=env:///CONFIGURATION
```
=== END


> On 23.02.2020, at 16:37, Daniel Dekany <dd...@apache.org> wrote:
> 
> Input documents is a fundamental concept in freemarker-generator, so we
> should think about that more, and probably refine/rework how it's done.
> 
> Currently it works like this, with CLI at least.
> 
>    freemarker-cli
>        -t access-report.ftl
>        somewhere/foo-access-log.csv
> 
> Then in access-report.ftl you have to do something like this:
> 
>    <#assign doc = Documents.get(0)>
>    ... process doc here
> 
> (The more idiomatic Documents[0] won't work. Actually, that lead to a funny
> chain of coincidences: It returned the string "D", then CSVTool.parse(...)
> happily parsed that to a table with the single column "D", and 0 rows, and
> as there were 0 rows, the template didn't run into an error because
> row.myExpectedColumn refers to a missing column either, so the process
> finished with success. (: Pretty unlucky for sure. The root was
> unintentionally breaking a FreeMarker idiom though; eventually we will have
> to work on those too, but, different topic.)
> 
> However, actually multiple input documents can be passed in:
> 
>    freemarker-cli
>        -t access-report.ftl
>        somewhere/foo-access-log.csv
>        somewhere/bar-access-log.csv
> 
> Above template will still work, though then you ignored all but the first
> document. So if you expect any number of input documents, you probably will
> have to do this:
> 
>    <#list Documents.list as doc>
>          ... process doc here
>    </#list>
> 
> (The more idiomatic <#list Documents as doc> won't work; but again, those
> we will work out in a different thread.)
> 
> 
> So, what would be better, in my opinion. I start out from what I think are
> the common uses cases, in decreasing order of frequency. Goal is to make
> those less error prone for the users, and simpler to express.
> 
> USE CASE 1
> 
> You have exactly 1 input documents, which is therefore simply "the"
> document in the mind of the user. This is probably the typical use case,
> but at least the use case users typically start out from when starting the
> work.
> 
>    freemarker-cli
>        -t access-report.ftl
>        somewhere/foo-access-log.csv
> 
> Then `Documents.get(0)` is not very fitting. Most importantly it's error
> prone, because if the user passed in more than 1 documents (can even happen
> totally accidentally, like if the user was lazy and used a wildcard that
> the shell exploded), the template will silently ignore the rest of the
> documents, and the singe document processed will be practically picked
> randomly. The user might won't notice that and submits a bad report or such.
> 
> I think that in this use case the document should be simply referred as
> `Document` in the template. When you have multiple documents there,
> referring to `Document` should be an error, saying that the template was
> made to process a single document only.
> 
> 
> USE CASE 2
> 
> You have multiple input documents, but each has different role (different
> schema, maybe different file type). Like, you pass in users.csv and
> groups.csv. Each has difference schema, and so you want to access them
> differently, but in the same template.
> 
>    freemarker-cli
>        [...]
>        --named-document users somewhere/foo-users.csv
>        --named-document groups somewhere/foo-groups.csv
> 
> Then in the template you could refer to them as: `NamedDocuments.users`,
> and `NamedDocuments.groups`.
> 
> Use Case 1, and 2 can be unified into a coherent concept, where `Document`
> is just a shorthand for `NamedDocuments.main`. It's called "main" because
> that's "the" document the template is about, but then you have to added
> some helper documents, with symbolic names representing their role.
> 
>    freemarker-cli
>        -t access-report.ftl
>        --document-name=main somewhere/foo-access-log.csv
>        --document-name=users somewhere/foo-users.csv
>        --document-name=groups somewhere/foo-groups.csv
> 
> Here, `Document` still works in the template, and it refers to
> `somewhere/foo-access-log.csv`. (While omitting --document-name=main above
> would be cleaner, I couldn't figure out how to do that with Picocli.
> Anyway, for now the point is the concept, which is not specific to CLI.)
> 
> 
> USE CASE 3
> 
> Here you have several of the same kind of documents. That has a more
> generic sub-use-case, when you have explicitly named documents (like
> "users" above), and for some you expect multiple input files.
> 
>    freemarker-cli
>        -t access-report.ftl
>        --document-name=main somewhere/foo-access-log.csv
> somewhere/bar-access-log.csv
>        --document-name=users somewhere/foo-users.csv
> somewhere/bar-users.csv
>        --document-name=groups somewhere/global-groups.csv
> 
> The template must to be written with this use case in mind, as now it has
> #list some of the documents. (I think in practice you hardly ever want to
> get a document by hard coded index. Either you don't know how many
> documents you have, so you can't use hard coded indexes, or you do, and
> each index has a specific meaning, but then you should name the documents
> instead, as using indexes is error prone, and hard to read.)
> Accessing that list of documents in the template, maybe could be done like
> this:
> - For the "main" documents: `DocumentList`
> - For explicitly named documents, like "users": `NamedDocumentLists.users`
> 
> 
> SUMMING UP
> 
> To unify all 3 use cases into a coherent concept:
> - `NamedDocumentLists.<name>` is the most generic form, and while you can
> achieve everything with it, using it requires your template to handle the
> most generic case too. So, I think it would be rarely used.
> - `DocumentList` is just a shorthand for `NamedDocumentLists.main`. It's
> used if you only have one kind of documents (single format and schema), but
> potentially multiple of them.
> - `NamedDocuments.<name>` expresses that you expect exactly 1 document of
> the given name.
> - `Document` is just a shorthand for `NamedDocuments.main`. This is for the
> most natural/frequent use case.
> 
> That's 4 possible ways of accessing your documents, which is a trade-off
> for the sake of these:
> - Catching CLI (or Maven, etc.) input where the template output likely will
> be wrong. That's only possible if the user can communicate its intent in
> the template.
> - Users don't need to deal with concepts that are irrelevant in their
> concrete use case. Just start with the trivial, `Document`, and later if
> the need arises, generalize to named documents, document lists, or both.
> 
> 
> What do guys think?

Re: freemarker-generator: Improving the input documents concept

Posted by Daniel Dekany <dd...@apache.org>.

Correction... this is not what I meant:

    freemarker-cli
        [...]
        --named-document users somewhere/foo-users.csv
        --named-document groups somewhere/foo-groups.csv

It should have been this:

    freemarker-cli
        [...]
        --document-name=users somewhere/foo-users.csv
        --document-name=groups somewhere/foo-groups.csv


On Sun, Feb 23, 2020 at 4:37 PM Daniel Dekany <dd...@apache.org> wrote:

> Input documents is a fundamental concept in freemarker-generator, so we
> should think about that more, and probably refine/rework how it's done.
>
> Currently it works like this, with CLI at least.
>
>     freemarker-cli
>         -t access-report.ftl
>         somewhere/foo-access-log.csv
>
> Then in access-report.ftl you have to do something like this:
>
>     <#assign doc = Documents.get(0)>
>     ... process doc here
>
> (The more idiomatic Documents[0] won't work. Actually, that lead to a
> funny chain of coincidences: It returned the string "D", then
> CSVTool.parse(...) happily parsed that to a table with the single column
> "D", and 0 rows, and as there were 0 rows, the template didn't run into an
> error because row.myExpectedColumn refers to a missing column either, so
> the process finished with success. (: Pretty unlucky for sure. The root was
> unintentionally breaking a FreeMarker idiom though; eventually we will have
> to work on those too, but, different topic.)
>
> However, actually multiple input documents can be passed in:
>
>     freemarker-cli
>         -t access-report.ftl
>         somewhere/foo-access-log.csv
>         somewhere/bar-access-log.csv
>
> Above template will still work, though then you ignored all but the first
> document. So if you expect any number of input documents, you probably will
> have to do this:
>
>     <#list Documents.list as doc>
>           ... process doc here
>     </#list>
>
> (The more idiomatic <#list Documents as doc> won't work; but again, those
> we will work out in a different thread.)
>
>
> So, what would be better, in my opinion. I start out from what I think are
> the common uses cases, in decreasing order of frequency. Goal is to make
> those less error prone for the users, and simpler to express.
>
> USE CASE 1
>
> You have exactly 1 input documents, which is therefore simply "the"
> document in the mind of the user. This is probably the typical use case,
> but at least the use case users typically start out from when starting the
> work.
>
>     freemarker-cli
>         -t access-report.ftl
>         somewhere/foo-access-log.csv
>
> Then `Documents.get(0)` is not very fitting. Most importantly it's error
> prone, because if the user passed in more than 1 documents (can even happen
> totally accidentally, like if the user was lazy and used a wildcard that
> the shell exploded), the template will silently ignore the rest of the
> documents, and the singe document processed will be practically picked
> randomly. The user might won't notice that and submits a bad report or such.
>
> I think that in this use case the document should be simply referred as
> `Document` in the template. When you have multiple documents there,
> referring to `Document` should be an error, saying that the template was
> made to process a single document only.
>
>
> USE CASE 2
>
> You have multiple input documents, but each has different role (different
> schema, maybe different file type). Like, you pass in users.csv and
> groups.csv. Each has difference schema, and so you want to access them
> differently, but in the same template.
>
>     freemarker-cli
>         [...]
>         --named-document users somewhere/foo-users.csv
>         --named-document groups somewhere/foo-groups.csv
>
> Then in the template you could refer to them as: `NamedDocuments.users`,
> and `NamedDocuments.groups`.
>
> Use Case 1, and 2 can be unified into a coherent concept, where `Document`
> is just a shorthand for `NamedDocuments.main`. It's called "main" because
> that's "the" document the template is about, but then you have to added
> some helper documents, with symbolic names representing their role.
>
>     freemarker-cli
>         -t access-report.ftl
>         --document-name=main somewhere/foo-access-log.csv
>         --document-name=users somewhere/foo-users.csv
>         --document-name=groups somewhere/foo-groups.csv
>
> Here, `Document` still works in the template, and it refers to
> `somewhere/foo-access-log.csv`. (While omitting --document-name=main above
> would be cleaner, I couldn't figure out how to do that with Picocli.
> Anyway, for now the point is the concept, which is not specific to CLI.)
>
>
> USE CASE 3
>
> Here you have several of the same kind of documents. That has a more
> generic sub-use-case, when you have explicitly named documents (like
> "users" above), and for some you expect multiple input files.
>
>     freemarker-cli
>         -t access-report.ftl
>         --document-name=main somewhere/foo-access-log.csv
> somewhere/bar-access-log.csv
>         --document-name=users somewhere/foo-users.csv
> somewhere/bar-users.csv
>         --document-name=groups somewhere/global-groups.csv
>
> The template must to be written with this use case in mind, as now it has
> #list some of the documents. (I think in practice you hardly ever want to
> get a document by hard coded index. Either you don't know how many
> documents you have, so you can't use hard coded indexes, or you do, and
> each index has a specific meaning, but then you should name the documents
> instead, as using indexes is error prone, and hard to read.)
> Accessing that list of documents in the template, maybe could be done like
> this:
> - For the "main" documents: `DocumentList`
> - For explicitly named documents, like "users": `NamedDocumentLists.users`
>
>
> SUMMING UP
>
> To unify all 3 use cases into a coherent concept:
> - `NamedDocumentLists.<name>` is the most generic form, and while you can
> achieve everything with it, using it requires your template to handle the
> most generic case too. So, I think it would be rarely used.
> - `DocumentList` is just a shorthand for `NamedDocumentLists.main`. It's
> used if you only have one kind of documents (single format and schema), but
> potentially multiple of them.
> - `NamedDocuments.<name>` expresses that you expect exactly 1 document of
> the given name.
> - `Document` is just a shorthand for `NamedDocuments.main`. This is for
> the most natural/frequent use case.
>
> That's 4 possible ways of accessing your documents, which is a trade-off
> for the sake of these:
> - Catching CLI (or Maven, etc.) input where the template output likely
> will be wrong. That's only possible if the user can communicate its intent
> in the template.
> - Users don't need to deal with concepts that are irrelevant in their
> concrete use case. Just start with the trivial, `Document`, and later if
> the need arises, generalize to named documents, document lists, or both.
>
>
> What do guys think?
>
>