You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sling.apache.org by Simone Tripodi <si...@apache.org> on 2018/11/12 19:46:08 UTC
[DISCUSS] Feature Files I/O long-therm proposal: improving the
processing pipeline
Hi all mates,
during the last couple of months the work we've been doing on Feature
files processing is HUGE, so the iterations to refine the pipeline
process introduced some "overhead" operations we can improve, what we
currently do is:
* the pre processor starts by reading the whole file to memory,
storing it in a String reference;
* parse the JSON file to create the javax.json DOM and check the `id`
property is missing, adding it if necessary and then serializing it to
string again:
* JSON Schema validation takes the string as input, creates the
Jackson DOM to validate it against the defined schema;
* if schema validation is OK, the Substitution takes the JSON string
as input to interpolate variables, which creates a new JSON string
representation;
* the JS Min takes the JSON string representation and converts it to
a new JSON string representation where useless stuff are removed;
* at that point, the JSON Feature reader takes the final string and
creates a javax.json DOM once again to map it to a Feature instance.
My proposal is improving a little our pipeline in order to speed up
the JSON processing in that way:
* the JS Min starts by reading the whole file to memory, storing it
in a String reference;
* the Substitution takes the JSON string as input to interpolate
variables, which creates a new JSON string representation;
* a Jackson DOM will be created in order to check the `id` property
is missing, adding it if necessary;
* the Jackson DOM will be validated against the defined schema;
* the Jackson DOM will be mapped to a Feature instance.
WDYT?
Many thanks in advance!
~Simo
http://people.apache.org/~simonetripodi/
http://twitter.com/simonetripodi
Re: [DISCUSS] Feature Files I/O long-therm proposal: improving the
processing pipeline
Posted by Carsten Ziegeler <cz...@apache.org>.
We have to distinguish between the json representation of a feature as
defined in our "specification" and the additional support of our maven
plugin.
A feature must have an ID and we don't allow interpolation of
placeholders within the feature. Therefore the json code that we have to
read a feature in the feature-io module works on this basis. And we
should not change that.
However, for maven based projects, the maven plugin allows to leave out
the id; it then gets calculated based on the project coordinates and the
file name. And the maven plugin also allows to interpolate placeholders
in the feature file.
Now in a maven project, the above two things need to happen before
validation, otherwise the validation might fail.
If we can make these steps more efficient, great - but we must not break
the separation of the functionality.
In addition, do we have any numbers that claim how "slow" this currently
is and how much we can improve this? Feature files are rather small and
all processing happens in memory. In addition we're talking about a
build time tool here, so if it spends some extra milliseconds this
doesn't really matter.
Again, if we can improve, let's do it - but I think it's not our most
urgent problem to fix wrt the feature model
Regards
Carsten
Am 13.11.2018 um 02:53 schrieb Justin Edelson:
> Hi Simo,
> Take this with several grains of salt as I don't know the internals of the
> feature processing, but just looking at your email from a generic "how do I
> process a JSON file" it still seems inefficient.
>
> Ideally, IMO, the substitution would be done as a filter applied to the
> stream of parser events. That way the entire String is not held in memory
> -- only the parsed DOM. I suspect it is also "safer" in the sense that you
> can more tightly control the context in which interpolation occurs (for
> example, interpolation should be allowed in string values, but not keys);
> the flip side is that it also is more restrictive, i.e. supporting
> interpolation of non-String values would be non-trivial (then again, doing
> this would make the original document invalid JSON so I'm not sure this is
> a real use case). I would suggest taking a look at Jackson's
> JsonParserDelegate.
>
> Regards,
> Justin
>
> On Mon, Nov 12, 2018 at 2:46 PM Simone Tripodi <si...@apache.org>
> wrote:
>
>> Hi all mates,
>>
>> during the last couple of months the work we've been doing on Feature
>> files processing is HUGE, so the iterations to refine the pipeline
>> process introduced some "overhead" operations we can improve, what we
>> currently do is:
>>
>> * the pre processor starts by reading the whole file to memory,
>> storing it in a String reference;
>> * parse the JSON file to create the javax.json DOM and check the `id`
>> property is missing, adding it if necessary and then serializing it to
>> string again:
>> * JSON Schema validation takes the string as input, creates the
>> Jackson DOM to validate it against the defined schema;
>> * if schema validation is OK, the Substitution takes the JSON string
>> as input to interpolate variables, which creates a new JSON string
>> representation;
>> * the JS Min takes the JSON string representation and converts it to
>> a new JSON string representation where useless stuff are removed;
>> * at that point, the JSON Feature reader takes the final string and
>> creates a javax.json DOM once again to map it to a Feature instance.
>>
>> My proposal is improving a little our pipeline in order to speed up
>> the JSON processing in that way:
>>
>> * the JS Min starts by reading the whole file to memory, storing it
>> in a String reference;
>> * the Substitution takes the JSON string as input to interpolate
>> variables, which creates a new JSON string representation;
>> * a Jackson DOM will be created in order to check the `id` property
>> is missing, adding it if necessary;
>> * the Jackson DOM will be validated against the defined schema;
>> * the Jackson DOM will be mapped to a Feature instance.
>>
>> WDYT?
>>
>> Many thanks in advance!
>> ~Simo
>>
>> http://people.apache.org/~simonetripodi/
>> http://twitter.com/simonetripodi
>>
>
--
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org
Re: [DISCUSS] Feature Files I/O long-therm proposal: improving the
processing pipeline
Posted by Justin Edelson <ju...@justinedelson.com>.
Hi Simo,
Take this with several grains of salt as I don't know the internals of the
feature processing, but just looking at your email from a generic "how do I
process a JSON file" it still seems inefficient.
Ideally, IMO, the substitution would be done as a filter applied to the
stream of parser events. That way the entire String is not held in memory
-- only the parsed DOM. I suspect it is also "safer" in the sense that you
can more tightly control the context in which interpolation occurs (for
example, interpolation should be allowed in string values, but not keys);
the flip side is that it also is more restrictive, i.e. supporting
interpolation of non-String values would be non-trivial (then again, doing
this would make the original document invalid JSON so I'm not sure this is
a real use case). I would suggest taking a look at Jackson's
JsonParserDelegate.
Regards,
Justin
On Mon, Nov 12, 2018 at 2:46 PM Simone Tripodi <si...@apache.org>
wrote:
> Hi all mates,
>
> during the last couple of months the work we've been doing on Feature
> files processing is HUGE, so the iterations to refine the pipeline
> process introduced some "overhead" operations we can improve, what we
> currently do is:
>
> * the pre processor starts by reading the whole file to memory,
> storing it in a String reference;
> * parse the JSON file to create the javax.json DOM and check the `id`
> property is missing, adding it if necessary and then serializing it to
> string again:
> * JSON Schema validation takes the string as input, creates the
> Jackson DOM to validate it against the defined schema;
> * if schema validation is OK, the Substitution takes the JSON string
> as input to interpolate variables, which creates a new JSON string
> representation;
> * the JS Min takes the JSON string representation and converts it to
> a new JSON string representation where useless stuff are removed;
> * at that point, the JSON Feature reader takes the final string and
> creates a javax.json DOM once again to map it to a Feature instance.
>
> My proposal is improving a little our pipeline in order to speed up
> the JSON processing in that way:
>
> * the JS Min starts by reading the whole file to memory, storing it
> in a String reference;
> * the Substitution takes the JSON string as input to interpolate
> variables, which creates a new JSON string representation;
> * a Jackson DOM will be created in order to check the `id` property
> is missing, adding it if necessary;
> * the Jackson DOM will be validated against the defined schema;
> * the Jackson DOM will be mapped to a Feature instance.
>
> WDYT?
>
> Many thanks in advance!
> ~Simo
>
> http://people.apache.org/~simonetripodi/
> http://twitter.com/simonetripodi
>