You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sling.apache.org by Simone Tripodi <si...@apache.org> on 2018/11/12 19:46:08 UTC

[DISCUSS] Feature Files I/O long-therm proposal: improving the processing pipeline

Hi all mates,

during the last couple of months the work we've been doing on Feature
files processing is HUGE, so the iterations to refine the pipeline
process introduced some "overhead" operations we can improve, what we
currently do is:

 * the pre processor starts by reading the whole file to memory,
storing it in a String reference;
 * parse the JSON file to create the javax.json DOM and check the `id`
property is missing, adding it if necessary and then serializing it to
string again:
 * JSON Schema validation takes the string as input, creates the
Jackson DOM to validate it against the defined schema;
 * if schema validation is OK, the Substitution takes the JSON string
as input to interpolate variables, which creates a new JSON string
representation;
 * the JS Min takes the JSON string representation and converts it to
a new JSON string representation where useless stuff are removed;
 * at that point, the JSON Feature reader takes the final string and
creates a javax.json DOM once again to map it to a Feature instance.

My proposal is improving a little our pipeline in order to speed up
the JSON processing in that way:

 * the JS Min starts by reading the whole file to memory, storing it
in a String reference;
 * the Substitution takes the JSON string as input to interpolate
variables, which creates a new JSON string representation;
 * a Jackson DOM will be created in order to check the `id` property
is missing, adding it if necessary;
 * the Jackson DOM will be validated against the defined schema;
 * the Jackson DOM will be mapped to a Feature instance.

WDYT?

Many thanks in advance!
~Simo

http://people.apache.org/~simonetripodi/
http://twitter.com/simonetripodi

Re: [DISCUSS] Feature Files I/O long-therm proposal: improving the processing pipeline

Posted by Carsten Ziegeler <cz...@apache.org>.
We have to distinguish between the json representation of a feature as 
defined in our "specification" and the additional support of our maven 
plugin.
A feature must have an ID and we don't allow interpolation of 
placeholders within the feature. Therefore the json code that we have to 
read a feature in the feature-io module works on this basis. And we 
should not change that.

However, for maven based projects, the maven plugin allows to leave out 
the id; it then gets calculated based on the project coordinates and the 
file name. And the maven plugin also allows to interpolate placeholders 
in the feature file.

Now in a maven project, the above two things need to happen before 
validation, otherwise the validation might fail.

If we can make these steps more efficient, great - but we must not break 
the separation of the functionality.

In addition, do we have any numbers that claim how "slow" this currently 
is and how much we can improve this? Feature files are rather small and 
all processing happens in memory. In addition we're talking about a 
build time tool here, so if it spends some extra milliseconds this 
doesn't really matter.

Again, if we can improve, let's do it - but I think it's not our most 
urgent problem to fix wrt the feature model

Regards
Carsten

Am 13.11.2018 um 02:53 schrieb Justin Edelson:
> Hi Simo,
> Take this with several grains of salt as I don't know the internals of the
> feature processing, but just looking at your email from a generic "how do I
> process a JSON file" it still seems inefficient.
> 
> Ideally, IMO, the substitution would be done as a filter applied to the
> stream of parser events. That way the entire String is not held in memory
> -- only the parsed DOM. I suspect it is also "safer" in the sense that you
> can more tightly control the context in which interpolation occurs (for
> example, interpolation should be allowed in string values, but not keys);
> the flip side is that it also is more restrictive, i.e. supporting
> interpolation of non-String values would be non-trivial (then again, doing
> this would make the original document invalid JSON so I'm not sure this is
> a real use case). I would suggest taking a look at Jackson's
> JsonParserDelegate.
> 
> Regards,
> Justin
> 
> On Mon, Nov 12, 2018 at 2:46 PM Simone Tripodi <si...@apache.org>
> wrote:
> 
>> Hi all mates,
>>
>> during the last couple of months the work we've been doing on Feature
>> files processing is HUGE, so the iterations to refine the pipeline
>> process introduced some "overhead" operations we can improve, what we
>> currently do is:
>>
>>   * the pre processor starts by reading the whole file to memory,
>> storing it in a String reference;
>>   * parse the JSON file to create the javax.json DOM and check the `id`
>> property is missing, adding it if necessary and then serializing it to
>> string again:
>>   * JSON Schema validation takes the string as input, creates the
>> Jackson DOM to validate it against the defined schema;
>>   * if schema validation is OK, the Substitution takes the JSON string
>> as input to interpolate variables, which creates a new JSON string
>> representation;
>>   * the JS Min takes the JSON string representation and converts it to
>> a new JSON string representation where useless stuff are removed;
>>   * at that point, the JSON Feature reader takes the final string and
>> creates a javax.json DOM once again to map it to a Feature instance.
>>
>> My proposal is improving a little our pipeline in order to speed up
>> the JSON processing in that way:
>>
>>   * the JS Min starts by reading the whole file to memory, storing it
>> in a String reference;
>>   * the Substitution takes the JSON string as input to interpolate
>> variables, which creates a new JSON string representation;
>>   * a Jackson DOM will be created in order to check the `id` property
>> is missing, adding it if necessary;
>>   * the Jackson DOM will be validated against the defined schema;
>>   * the Jackson DOM will be mapped to a Feature instance.
>>
>> WDYT?
>>
>> Many thanks in advance!
>> ~Simo
>>
>> http://people.apache.org/~simonetripodi/
>> http://twitter.com/simonetripodi
>>
> 

-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

Re: [DISCUSS] Feature Files I/O long-therm proposal: improving the processing pipeline

Posted by Justin Edelson <ju...@justinedelson.com>.
Hi Simo,
Take this with several grains of salt as I don't know the internals of the
feature processing, but just looking at your email from a generic "how do I
process a JSON file" it still seems inefficient.

Ideally, IMO, the substitution would be done as a filter applied to the
stream of parser events. That way the entire String is not held in memory
-- only the parsed DOM. I suspect it is also "safer" in the sense that you
can more tightly control the context in which interpolation occurs (for
example, interpolation should be allowed in string values, but not keys);
the flip side is that it also is more restrictive, i.e. supporting
interpolation of non-String values would be non-trivial (then again, doing
this would make the original document invalid JSON so I'm not sure this is
a real use case). I would suggest taking a look at Jackson's
JsonParserDelegate.

Regards,
Justin

On Mon, Nov 12, 2018 at 2:46 PM Simone Tripodi <si...@apache.org>
wrote:

> Hi all mates,
>
> during the last couple of months the work we've been doing on Feature
> files processing is HUGE, so the iterations to refine the pipeline
> process introduced some "overhead" operations we can improve, what we
> currently do is:
>
>  * the pre processor starts by reading the whole file to memory,
> storing it in a String reference;
>  * parse the JSON file to create the javax.json DOM and check the `id`
> property is missing, adding it if necessary and then serializing it to
> string again:
>  * JSON Schema validation takes the string as input, creates the
> Jackson DOM to validate it against the defined schema;
>  * if schema validation is OK, the Substitution takes the JSON string
> as input to interpolate variables, which creates a new JSON string
> representation;
>  * the JS Min takes the JSON string representation and converts it to
> a new JSON string representation where useless stuff are removed;
>  * at that point, the JSON Feature reader takes the final string and
> creates a javax.json DOM once again to map it to a Feature instance.
>
> My proposal is improving a little our pipeline in order to speed up
> the JSON processing in that way:
>
>  * the JS Min starts by reading the whole file to memory, storing it
> in a String reference;
>  * the Substitution takes the JSON string as input to interpolate
> variables, which creates a new JSON string representation;
>  * a Jackson DOM will be created in order to check the `id` property
> is missing, adding it if necessary;
>  * the Jackson DOM will be validated against the defined schema;
>  * the Jackson DOM will be mapped to a Feature instance.
>
> WDYT?
>
> Many thanks in advance!
> ~Simo
>
> http://people.apache.org/~simonetripodi/
> http://twitter.com/simonetripodi
>