You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Julian Hyde <jh...@gmail.com> on 2021/12/29 01:27:26 UTC

[DISCUSS] SBOM (Software Bill of Materials)

In the wake of the log4j CVEs [1], people are asking how to improve the
security of open source projects, and one idea is to provide a SBOM
(Software Bill of Materials) [2] along with each release.

I had not heard of SBOM until a couple of days ago. Is anyone on this list
familiar with SBOMs and their use? Should Calcite be providing an SBOM? Are
people aware of SBOM initiatives in other projects? What, in your opinion,
is the priority of this issue?

Julian

[1]
https://thehackernews.com/2021/12/second-log4j-vulnerability-cve-2021.html

[2] https://en.wikipedia.org/wiki/Software_bill_of_materials

Re: [DISCUSS] SBOM (Software Bill of Materials)

Posted by Julian Hyde <jh...@gmail.com>.
Not really; space-filling curves are rather technical. What I’d do is what Stamatis has already done: log a PR https://github.com/aioaneid/uzaygezen/pull/2 <https://github.com/aioaneid/uzaygezen/pull/2>.


> On Dec 30, 2021, at 4:40 PM, Scott Reynolds <sd...@gmail.com> wrote:
> 
> When I was dealing with Log4j, I discovered uzaygezen-core is pretty old
> and pulls in log4j1.x:
> https://mvnrepository.com/artifact/com.google.uzaygezen/uzaygezen-core/0.2
> 
> But doesn't actually use log4j (
> https://github.com/aioaneid/uzaygezen/search?q=log4j). For our project, I
> excluded log4j1.x from this dependency (
> https://github.com/twilio/calcite-kudu/pull/48/files#diff-9c5fb3d1b7e3b0f54bc5c4182965c4fe1f9023d449017cece3005d3f90e8e4d8R142-R156).
> 
> 
> This is an example of dependency we could replace with our own
> implementation?
> 
> On Thu, Dec 30, 2021 at 3:55 PM Julian Hyde <jh...@gmail.com> wrote:
> 
>> Regarding dependencies. Here are the runtime dependencies from
>> core/build.gradle.kts (ignoring test and annotation libraries):
>> 
>> * api("com.esri.geometry:esri-geometry-api")
>> * api("com.fasterxml.jackson.core:jackson-annotations")
>> * api("com.google.guava:guava")
>> * api("org.apache.calcite.avatica:avatica-core")
>> * api("org.slf4j:slf4j-api")
>> * implementation("com.fasterxml.jackson.core:jackson-core")
>> * implementation("com.fasterxml.jackson.core:jackson-databind")
>> *
>> implementation("com.fasterxml.jackson.dataformat:jackson-dataformat-yaml")
>> * implementation("com.google.uzaygezen:uzaygezen-core")
>> * implementation("com.jayway.jsonpath:json-path")
>> * implementation("com.yahoo.datasketches:sketches-core")
>> * implementation("commons-codec:commons-codec")
>> * implementation("net.hydromatic:aggdesigner-algorithm")
>> * implementation("org.apache.commons:commons-dbcp2")
>> * implementation("org.apache.commons:commons-lang3")
>> * implementation("commons-io:commons-io")
>> * implementation("org.codehaus.janino:commons-compiler")
>> * implementation("org.codehaus.janino:janino")
>> 
>> A few libraries are used only for a narrow range of functionality:
>> * esri-geometry and uzaygezen-core are used by geospatial functions;
>> * sketches-core is used by the HLL aggregate functions;
>> * json-path is used by some JSON functions;
>> * jackson-core, jackson-databind, jackson-dataformat-yaml are used to
>> load models, and to serialize RelNodes to and from JSON;
>> * commons-lang3, commons-codec, commons-io are probably only used in one
>> or two places each;
>> * aggdesigner-algotihm is used for recommending materialized views.
>> 
>> So, the easiest way to reduce dependencies would be to make certain
>> classes of SQL functions optional (i.e. move them out of core).
>> 
>> Julian
>> 
>> 
>> 
>>> On Dec 29, 2021, at 1:30 PM, Jacques Nadeau <ja...@apache.org> wrote:
>>> 
>>> WRT SBOM (Julian): My general experience is that most large orgs use
>>> scanners now (either open or closed) and they will scan whether you have
>> a
>>> bill of materials or not. I wouldn't worry about adding something
>>> additional.
>>> 
>>> WRT too many dependencies (Gunnar): I completely agree with the general
>>> feeling of too many (and with Guava, jackson less so). I think the core
>>> challenge (no pun intended) is that calcite-core is really a lot of
>>> different components. For example, I have frequently wished that parser,
>>> planner and enumerable were separate modules. And if they were, I'd guess
>>> that each would have a narrower dependency range. I've also wished many
>>> times that runtime compilation was an optional addon as opposed to
>>> required/coupled in the core...
>>> 
>>> When I've thought about how to dissect in the past, I think the big
>>> challenge would be tests, where things are sometimes mixed together.
>>> Breaking change possibilities could be at least somewhat mitigated by
>>> moving classes but not packages.
>>> 
>>> On Wed, Dec 29, 2021 at 1:51 AM Gunnar Morling
>>> <gu...@googlemail.com.invalid> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> In a way, Calcite's build configuration as well as the published POM
>> could
>>>> be considered as such an SBOM? In particular when looking at the latter
>>>> through services like mvnrepository [1], you get quite a good view on
>> the
>>>> dependency versions, licenses, any potential CVEs, etc. I think this
>> should
>>>> satisfy most user needs around this? Or are you referring to the notion
>> of
>>>> Maven BOM POMs specifically [2], i.e. the notion of publishing a POM
>> with
>>>> all the Calcite component versions which people can then use with
>> Maven's
>>>> import scope (there should be something comparable for Gradle)? If so,
>> that
>>>> could be useful for users working with multiple Calcite components,
>> though
>>>> I think the usability improvement provided by such BOM POM wouldn't be
>>>> huge.
>>>> 
>>>> I wanted to bring up a related matter though. Coming to Calcite as a
>> user
>>>> just recently (loving the possibilities it provides!), I was surprised
>> by
>>>> the large number of dependencies of the project. It looks like 1.29
>>>> improves that a little bit (no more kotlin-stdlib, no more transitive
>>>> dependency to log4j 1.x), but the transitive hull of all dependencies of
>>>> calcite-core still is quite big. I lack insight about what the different
>>>> dependencies are used for; but as an application developer, Guava for
>>>> instance is a dependency which I'd prefer to not get pushed onto the
>>>> classpath transitively. Jackson is another heavy one; depending on how
>> it's
>>>> used, perhaps this could be pushed into some separate module which users
>>>> could optionally  pull in? That'd help to avoid having it around when
>> users
>>>> work with other JSON libs themselves and don't require JSON support in
>>>> Calcite.
>>>> 
>>>> From a supply chain perspective, the less transitive dependencies a
>> library
>>>> like Calcite introduces to my project, the better IMHO. Less potential
>> for
>>>> version conflicts with my own (or other transitive) dependencies, and
>> also
>>>> less potential for introducing CVEs to the dependency graph, as e.g. in
>> the
>>>> case of the Guava version currently used by Calcite; I suppose it does
>> not
>>>> impact the usage in Calcite, but these things tend to be tricky to
>> reason
>>>> about, and typical CVE reporting tooling will now create a warning for a
>>>> project using Calcite, no matter whether that specific issue actually
>> is a
>>>> problem or not.
>>>> 
>>>> Best,
>>>> 
>>>> --Gunnar
>>>> 
>>>> [1]
>>>> 
>> https://mvnrepository.com/artifact/org.apache.calcite/calcite-core/1.29.0
>>>> [2]
>>>> 
>>>> 
>> https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#bill-of-materials-bom-poms
>>>> 
>>>> 
>>>> Am Mi., 29. Dez. 2021 um 02:27 Uhr schrieb Julian Hyde <
>>>> jhyde.apache@gmail.com>:
>>>> 
>>>>> In the wake of the log4j CVEs [1], people are asking how to improve the
>>>>> security of open source projects, and one idea is to provide a SBOM
>>>>> (Software Bill of Materials) [2] along with each release.
>>>>> 
>>>>> I had not heard of SBOM until a couple of days ago. Is anyone on this
>>>> list
>>>>> familiar with SBOMs and their use? Should Calcite be providing an SBOM?
>>>> Are
>>>>> people aware of SBOM initiatives in other projects? What, in your
>>>> opinion,
>>>>> is the priority of this issue?
>>>>> 
>>>>> Julian
>>>>> 
>>>>> [1]
>>>>> 
>>>> 
>> https://thehackernews.com/2021/12/second-log4j-vulnerability-cve-2021.html
>>>>> 
>>>>> [2] https://en.wikipedia.org/wiki/Software_bill_of_materials
>>>>> 
>>>> 
>> 
>> 


Re: [DISCUSS] SBOM (Software Bill of Materials)

Posted by Scott Reynolds <sd...@gmail.com>.
When I was dealing with Log4j, I discovered uzaygezen-core is pretty old
and pulls in log4j1.x:
https://mvnrepository.com/artifact/com.google.uzaygezen/uzaygezen-core/0.2

But doesn't actually use log4j (
https://github.com/aioaneid/uzaygezen/search?q=log4j). For our project, I
excluded log4j1.x from this dependency (
https://github.com/twilio/calcite-kudu/pull/48/files#diff-9c5fb3d1b7e3b0f54bc5c4182965c4fe1f9023d449017cece3005d3f90e8e4d8R142-R156).


This is an example of dependency we could replace with our own
implementation?

On Thu, Dec 30, 2021 at 3:55 PM Julian Hyde <jh...@gmail.com> wrote:

> Regarding dependencies. Here are the runtime dependencies from
> core/build.gradle.kts (ignoring test and annotation libraries):
>
>  * api("com.esri.geometry:esri-geometry-api")
>  * api("com.fasterxml.jackson.core:jackson-annotations")
>  * api("com.google.guava:guava")
>  * api("org.apache.calcite.avatica:avatica-core")
>  * api("org.slf4j:slf4j-api")
>  * implementation("com.fasterxml.jackson.core:jackson-core")
>  * implementation("com.fasterxml.jackson.core:jackson-databind")
>  *
> implementation("com.fasterxml.jackson.dataformat:jackson-dataformat-yaml")
>  * implementation("com.google.uzaygezen:uzaygezen-core")
>  * implementation("com.jayway.jsonpath:json-path")
>  * implementation("com.yahoo.datasketches:sketches-core")
>  * implementation("commons-codec:commons-codec")
>  * implementation("net.hydromatic:aggdesigner-algorithm")
>  * implementation("org.apache.commons:commons-dbcp2")
>  * implementation("org.apache.commons:commons-lang3")
>  * implementation("commons-io:commons-io")
>  * implementation("org.codehaus.janino:commons-compiler")
>  * implementation("org.codehaus.janino:janino")
>
> A few libraries are used only for a narrow range of functionality:
>  * esri-geometry and uzaygezen-core are used by geospatial functions;
>  * sketches-core is used by the HLL aggregate functions;
>  * json-path is used by some JSON functions;
>  * jackson-core, jackson-databind, jackson-dataformat-yaml are used to
> load models, and to serialize RelNodes to and from JSON;
>  * commons-lang3, commons-codec, commons-io are probably only used in one
> or two places each;
>  * aggdesigner-algotihm is used for recommending materialized views.
>
> So, the easiest way to reduce dependencies would be to make certain
> classes of SQL functions optional (i.e. move them out of core).
>
> Julian
>
>
>
> > On Dec 29, 2021, at 1:30 PM, Jacques Nadeau <ja...@apache.org> wrote:
> >
> > WRT SBOM (Julian): My general experience is that most large orgs use
> > scanners now (either open or closed) and they will scan whether you have
> a
> > bill of materials or not. I wouldn't worry about adding something
> > additional.
> >
> > WRT too many dependencies (Gunnar): I completely agree with the general
> > feeling of too many (and with Guava, jackson less so). I think the core
> > challenge (no pun intended) is that calcite-core is really a lot of
> > different components. For example, I have frequently wished that parser,
> > planner and enumerable were separate modules. And if they were, I'd guess
> > that each would have a narrower dependency range. I've also wished many
> > times that runtime compilation was an optional addon as opposed to
> > required/coupled in the core...
> >
> > When I've thought about how to dissect in the past, I think the big
> > challenge would be tests, where things are sometimes mixed together.
> > Breaking change possibilities could be at least somewhat mitigated by
> > moving classes but not packages.
> >
> > On Wed, Dec 29, 2021 at 1:51 AM Gunnar Morling
> > <gu...@googlemail.com.invalid> wrote:
> >
> >> Hi,
> >>
> >> In a way, Calcite's build configuration as well as the published POM
> could
> >> be considered as such an SBOM? In particular when looking at the latter
> >> through services like mvnrepository [1], you get quite a good view on
> the
> >> dependency versions, licenses, any potential CVEs, etc. I think this
> should
> >> satisfy most user needs around this? Or are you referring to the notion
> of
> >> Maven BOM POMs specifically [2], i.e. the notion of publishing a POM
> with
> >> all the Calcite component versions which people can then use with
> Maven's
> >> import scope (there should be something comparable for Gradle)? If so,
> that
> >> could be useful for users working with multiple Calcite components,
> though
> >> I think the usability improvement provided by such BOM POM wouldn't be
> >> huge.
> >>
> >> I wanted to bring up a related matter though. Coming to Calcite as a
> user
> >> just recently (loving the possibilities it provides!), I was surprised
> by
> >> the large number of dependencies of the project. It looks like 1.29
> >> improves that a little bit (no more kotlin-stdlib, no more transitive
> >> dependency to log4j 1.x), but the transitive hull of all dependencies of
> >> calcite-core still is quite big. I lack insight about what the different
> >> dependencies are used for; but as an application developer, Guava for
> >> instance is a dependency which I'd prefer to not get pushed onto the
> >> classpath transitively. Jackson is another heavy one; depending on how
> it's
> >> used, perhaps this could be pushed into some separate module which users
> >> could optionally  pull in? That'd help to avoid having it around when
> users
> >> work with other JSON libs themselves and don't require JSON support in
> >> Calcite.
> >>
> >> From a supply chain perspective, the less transitive dependencies a
> library
> >> like Calcite introduces to my project, the better IMHO. Less potential
> for
> >> version conflicts with my own (or other transitive) dependencies, and
> also
> >> less potential for introducing CVEs to the dependency graph, as e.g. in
> the
> >> case of the Guava version currently used by Calcite; I suppose it does
> not
> >> impact the usage in Calcite, but these things tend to be tricky to
> reason
> >> about, and typical CVE reporting tooling will now create a warning for a
> >> project using Calcite, no matter whether that specific issue actually
> is a
> >> problem or not.
> >>
> >> Best,
> >>
> >> --Gunnar
> >>
> >> [1]
> >>
> https://mvnrepository.com/artifact/org.apache.calcite/calcite-core/1.29.0
> >> [2]
> >>
> >>
> https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#bill-of-materials-bom-poms
> >>
> >>
> >> Am Mi., 29. Dez. 2021 um 02:27 Uhr schrieb Julian Hyde <
> >> jhyde.apache@gmail.com>:
> >>
> >>> In the wake of the log4j CVEs [1], people are asking how to improve the
> >>> security of open source projects, and one idea is to provide a SBOM
> >>> (Software Bill of Materials) [2] along with each release.
> >>>
> >>> I had not heard of SBOM until a couple of days ago. Is anyone on this
> >> list
> >>> familiar with SBOMs and their use? Should Calcite be providing an SBOM?
> >> Are
> >>> people aware of SBOM initiatives in other projects? What, in your
> >> opinion,
> >>> is the priority of this issue?
> >>>
> >>> Julian
> >>>
> >>> [1]
> >>>
> >>
> https://thehackernews.com/2021/12/second-log4j-vulnerability-cve-2021.html
> >>>
> >>> [2] https://en.wikipedia.org/wiki/Software_bill_of_materials
> >>>
> >>
>
>

Re: [DISCUSS] SBOM (Software Bill of Materials)

Posted by Julian Hyde <jh...@gmail.com>.
We want to generate classes from Java code, without spawning a process, or writing to the file system, and we want it to work in the JRE (not full JDK present). Only Janino does this, AFAIK.

> On Jan 4, 2022, at 1:01 PM, Gunnar Morling <gu...@googlemail.com.INVALID> wrote:
> 
> Am Di., 4. Jan. 2022 um 21:41 Uhr schrieb Julian Hyde <
> jhyde.apache@gmail.com>:
> 
>> No, I don’t think it matters in this case. But consistent use of ROOT is
>> useful, because someone in future might be tracking down a bug, and if they
>> see ENGLISH it’s one more hypothesis they’d have to discount.
>> 
> 
> That makes sense. Just let me know if you see the need for other changes to
> that PR. I may look into some of the other dependencies you mentioned as
> being rarely used, as I find the time.
> 
> Any thoughts on this one:
> 
>> Re Janino, is there any reason for not using the compiler implementation
> coming with the JDK
> 
> ?
> 
> Thanks,
> 
> --Gunnar
> 
> 
>>> On Jan 4, 2022, at 12:31 PM, Gunnar Morling
>> <gu...@googlemail.com.INVALID> wrote:
>>> 
>>> Am Di., 4. Jan. 2022 um 20:51 Uhr schrieb Julian Hyde <
>>> jhyde.apache@gmail.com>:
>>> 
>>>> If a method needs a locale, always pass Locale.ROOT.
>>>> 
>>> 
>>> Ok, I've changed it accordingly. Do you think it actually matters for the
>>> case at hand?
>>> 
>>>> On Jan 4, 2022, at 9:13 AM, Gunnar Morling
>>>> <gu...@googlemail.com.INVALID> wrote:
>>>>> 
>>>>> Am Di., 4. Jan. 2022 um 09:39 Uhr schrieb Julian Hyde <
>>>>> jhyde.apache@gmail.com>:
>>>>> 
>>>>>> Please log a jira case for the commons-lang3 change.
>>>>> 
>>>>> 
>>>>> Logged https://issues.apache.org/jira/browse/CALCITE-4975.
>>>>> 
>>>>> 
>>>>>> It looks good. One or two places I’d create a function rather than
>>>> having
>>>>>> a blob of code inline.
>>>>>> 
>>>>> 
>>>>> Sure; just let me know where exactly.
>>>>> 
>>>>> Your use of default locale in the CSV adapter looks wrong. Calcite is a
>>>>>> server, so never uses default locale or time zone. In fact we use
>>>>>> forbiddenApis to check, so we should add a few methods to its
>>>> configuration.
>>>>> 
>>>>> 
>>>>> Yeah, I had been pondering about this; I don't think it matters, the
>>>> locale
>>>>> should not make any difference for these specific formats, as they
>> don't
>>>>> contain any locale-specific patterns (unlike, say, "MMM"). I've changed
>>>> it
>>>>> to Locale.ENGLISH now, just in case. In fact, I wanted to use the
>>>>> ofPattern() method without the Locale parameter, but this failed the
>>>>> forbiddenApis check as well :)
>>>>> 
>>>>> Julian
>>>>> 
>>>>> 
>>>>> Best,
>>>>> 
>>>>> --Gunnar
>>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Jan 3, 2022, at 12:30 PM, Gunnar Morling
>>>>>> <gu...@googlemail.com.invalid> wrote:
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> Thanks a lot for this, I think trimming down the dependencies of
>>>> Calcite
>>>>>>> will be of great help for its adoption.
>>>>>>> 
>>>>>>>> So, the easiest way to reduce dependencies would be to make certain
>>>>>>> classes of SQL functions optional (i.e. move them out of core).
>>>>>>> 
>>>>>>> That sounds like a good idea.
>>>>>>> 
>>>>>>>> commons-lang3, commons-codec, commons-io are probably only used in
>> one
>>>>>> or
>>>>>>> two places each;
>>>>>>> 
>>>>>>> To make some progress there, I've created PR
>>>>>>> https://github.com/apache/calcite/pull/2672 which removes the
>>>>>> dependency to
>>>>>>> commons-lang3 from the entire code base. Any feedback on that PR
>> would
>>>>>>> be appreciated (I still need to log an issue, but wanted to share
>>>> quickly
>>>>>>> what I had). I can try and take a look at the other ones, if there's
>>>>>>> interest in this.
>>>>>>> 
>>>>>>> Re Janino, is there any reason for not using the compiler
>>>> implementation
>>>>>>> coming with the JDK? Alternatively, one could also consider to
>> generate
>>>>>>> byte code directly using ASM, which wouldn't be beneficial
>>>>>> dependency-wise,
>>>>>>> but it may improve the performance of this generation step (I still
>>>> lack
>>>>>>> insight why this is done in the first place).
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> --Gunnar
>>>>>>> 
>>>>>>>> Am Fr., 31. Dez. 2021 um 00:56 Uhr schrieb Julian Hyde <
>>>>>>>> jhyde.apache@gmail.com>:
>>>>>>>> 
>>>>>>>> Regarding dependencies. Here are the runtime dependencies from
>>>>>>>> core/build.gradle.kts (ignoring test and annotation libraries):
>>>>>>>> 
>>>>>>>> * api("com.esri.geometry:esri-geometry-api")
>>>>>>>> * api("com.fasterxml.jackson.core:jackson-annotations")
>>>>>>>> * api("com.google.guava:guava")
>>>>>>>> * api("org.apache.calcite.avatica:avatica-core")
>>>>>>>> * api("org.slf4j:slf4j-api")
>>>>>>>> * implementation("com.fasterxml.jackson.core:jackson-core")
>>>>>>>> * implementation("com.fasterxml.jackson.core:jackson-databind")
>>>>>>>> *
>>>>>>>> 
>>>>>> 
>>>> 
>> implementation("com.fasterxml.jackson.dataformat:jackson-dataformat-yaml")
>>>>>>>> * implementation("com.google.uzaygezen:uzaygezen-core")
>>>>>>>> * implementation("com.jayway.jsonpath:json-path")
>>>>>>>> * implementation("com.yahoo.datasketches:sketches-core")
>>>>>>>> * implementation("commons-codec:commons-codec")
>>>>>>>> * implementation("net.hydromatic:aggdesigner-algorithm")
>>>>>>>> * implementation("org.apache.commons:commons-dbcp2")
>>>>>>>> * implementation("org.apache.commons:commons-lang3")
>>>>>>>> * implementation("commons-io:commons-io")
>>>>>>>> * implementation("org.codehaus.janino:commons-compiler")
>>>>>>>> * implementation("org.codehaus.janino:janino")
>>>>>>>> 
>>>>>>>> A few libraries are used only for a narrow range of functionality:
>>>>>>>> * esri-geometry and uzaygezen-core are used by geospatial functions;
>>>>>>>> * sketches-core is used by the HLL aggregate functions;
>>>>>>>> * json-path is used by some JSON functions;
>>>>>>>> * jackson-core, jackson-databind, jackson-dataformat-yaml are used
>> to
>>>>>>>> load models, and to serialize RelNodes to and from JSON;
>>>>>>>> * commons-lang3, commons-codec, commons-io are probably only used in
>>>> one
>>>>>>>> or two places each;
>>>>>>>> * aggdesigner-algotihm is used for recommending materialized views.
>>>>>>>> 
>>>>>>>> So, the easiest way to reduce dependencies would be to make certain
>>>>>>>> classes of SQL functions optional (i.e. move them out of core).
>>>>>>>> 
>>>>>>>> Julian
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>>> On Dec 29, 2021, at 1:30 PM, Jacques Nadeau <ja...@apache.org>
>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> WRT SBOM (Julian): My general experience is that most large orgs
>> use
>>>>>>>>> scanners now (either open or closed) and they will scan whether you
>>>>>> have
>>>>>>>> a
>>>>>>>>> bill of materials or not. I wouldn't worry about adding something
>>>>>>>>> additional.
>>>>>>>>> 
>>>>>>>>> WRT too many dependencies (Gunnar): I completely agree with the
>>>> general
>>>>>>>>> feeling of too many (and with Guava, jackson less so). I think the
>>>> core
>>>>>>>>> challenge (no pun intended) is that calcite-core is really a lot of
>>>>>>>>> different components. For example, I have frequently wished that
>>>>>> parser,
>>>>>>>>> planner and enumerable were separate modules. And if they were, I'd
>>>>>> guess
>>>>>>>>> that each would have a narrower dependency range. I've also wished
>>>> many
>>>>>>>>> times that runtime compilation was an optional addon as opposed to
>>>>>>>>> required/coupled in the core...
>>>>>>>>> 
>>>>>>>>> When I've thought about how to dissect in the past, I think the big
>>>>>>>>> challenge would be tests, where things are sometimes mixed
>> together.
>>>>>>>>> Breaking change possibilities could be at least somewhat mitigated
>> by
>>>>>>>>> moving classes but not packages.
>>>>>>>>> 
>>>>>>>>> On Wed, Dec 29, 2021 at 1:51 AM Gunnar Morling
>>>>>>>>> <gu...@googlemail.com.invalid> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> In a way, Calcite's build configuration as well as the published
>> POM
>>>>>>>> could
>>>>>>>>>> be considered as such an SBOM? In particular when looking at the
>>>>>> latter
>>>>>>>>>> through services like mvnrepository [1], you get quite a good view
>>>> on
>>>>>>>> the
>>>>>>>>>> dependency versions, licenses, any potential CVEs, etc. I think
>> this
>>>>>>>> should
>>>>>>>>>> satisfy most user needs around this? Or are you referring to the
>>>>>> notion
>>>>>>>> of
>>>>>>>>>> Maven BOM POMs specifically [2], i.e. the notion of publishing a
>> POM
>>>>>>>> with
>>>>>>>>>> all the Calcite component versions which people can then use with
>>>>>>>> Maven's
>>>>>>>>>> import scope (there should be something comparable for Gradle)? If
>>>> so,
>>>>>>>> that
>>>>>>>>>> could be useful for users working with multiple Calcite
>> components,
>>>>>>>> though
>>>>>>>>>> I think the usability improvement provided by such BOM POM
>> wouldn't
>>>> be
>>>>>>>>>> huge.
>>>>>>>>>> 
>>>>>>>>>> I wanted to bring up a related matter though. Coming to Calcite
>> as a
>>>>>>>> user
>>>>>>>>>> just recently (loving the possibilities it provides!), I was
>>>> surprised
>>>>>>>> by
>>>>>>>>>> the large number of dependencies of the project. It looks like
>> 1.29
>>>>>>>>>> improves that a little bit (no more kotlin-stdlib, no more
>>>> transitive
>>>>>>>>>> dependency to log4j 1.x), but the transitive hull of all
>>>> dependencies
>>>>>> of
>>>>>>>>>> calcite-core still is quite big. I lack insight about what the
>>>>>> different
>>>>>>>>>> dependencies are used for; but as an application developer, Guava
>>>> for
>>>>>>>>>> instance is a dependency which I'd prefer to not get pushed onto
>> the
>>>>>>>>>> classpath transitively. Jackson is another heavy one; depending on
>>>> how
>>>>>>>> it's
>>>>>>>>>> used, perhaps this could be pushed into some separate module which
>>>>>> users
>>>>>>>>>> could optionally  pull in? That'd help to avoid having it around
>>>> when
>>>>>>>> users
>>>>>>>>>> work with other JSON libs themselves and don't require JSON
>> support
>>>> in
>>>>>>>>>> Calcite.
>>>>>>>>>> 
>>>>>>>>>> From a supply chain perspective, the less transitive dependencies
>> a
>>>>>>>> library
>>>>>>>>>> like Calcite introduces to my project, the better IMHO. Less
>>>> potential
>>>>>>>> for
>>>>>>>>>> version conflicts with my own (or other transitive) dependencies,
>>>> and
>>>>>>>> also
>>>>>>>>>> less potential for introducing CVEs to the dependency graph, as
>> e.g.
>>>>>> in
>>>>>>>> the
>>>>>>>>>> case of the Guava version currently used by Calcite; I suppose it
>>>> does
>>>>>>>> not
>>>>>>>>>> impact the usage in Calcite, but these things tend to be tricky to
>>>>>>>> reason
>>>>>>>>>> about, and typical CVE reporting tooling will now create a warning
>>>>>> for a
>>>>>>>>>> project using Calcite, no matter whether that specific issue
>>>> actually
>>>>>>>> is a
>>>>>>>>>> problem or not.
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> 
>>>>>>>>>> --Gunnar
>>>>>>>>>> 
>>>>>>>>>> [1]
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> https://mvnrepository.com/artifact/org.apache.calcite/calcite-core/1.29.0
>>>>>>>>>> [2]
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#bill-of-materials-bom-poms
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Am Mi., 29. Dez. 2021 um 02:27 Uhr schrieb Julian Hyde <
>>>>>>>>>> jhyde.apache@gmail.com>:
>>>>>>>>>> 
>>>>>>>>>>> In the wake of the log4j CVEs [1], people are asking how to
>> improve
>>>>>> the
>>>>>>>>>>> security of open source projects, and one idea is to provide a
>> SBOM
>>>>>>>>>>> (Software Bill of Materials) [2] along with each release.
>>>>>>>>>>> 
>>>>>>>>>>> I had not heard of SBOM until a couple of days ago. Is anyone on
>>>> this
>>>>>>>>>> list
>>>>>>>>>>> familiar with SBOMs and their use? Should Calcite be providing an
>>>>>> SBOM?
>>>>>>>>>> Are
>>>>>>>>>>> people aware of SBOM initiatives in other projects? What, in your
>>>>>>>>>> opinion,
>>>>>>>>>>> is the priority of this issue?
>>>>>>>>>>> 
>>>>>>>>>>> Julian
>>>>>>>>>>> 
>>>>>>>>>>> [1]
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> https://thehackernews.com/2021/12/second-log4j-vulnerability-cve-2021.html
>>>>>>>>>>> 
>>>>>>>>>>> [2] https://en.wikipedia.org/wiki/Software_bill_of_materials
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>> 


Re: [DISCUSS] SBOM (Software Bill of Materials)

Posted by Gunnar Morling <gu...@googlemail.com.INVALID>.
Am Di., 4. Jan. 2022 um 21:41 Uhr schrieb Julian Hyde <
jhyde.apache@gmail.com>:

> No, I don’t think it matters in this case. But consistent use of ROOT is
> useful, because someone in future might be tracking down a bug, and if they
> see ENGLISH it’s one more hypothesis they’d have to discount.
>

That makes sense. Just let me know if you see the need for other changes to
that PR. I may look into some of the other dependencies you mentioned as
being rarely used, as I find the time.

Any thoughts on this one:

> Re Janino, is there any reason for not using the compiler implementation
coming with the JDK

?

Thanks,

--Gunnar


> > On Jan 4, 2022, at 12:31 PM, Gunnar Morling
> <gu...@googlemail.com.INVALID> wrote:
> >
> > Am Di., 4. Jan. 2022 um 20:51 Uhr schrieb Julian Hyde <
> > jhyde.apache@gmail.com>:
> >
> >> If a method needs a locale, always pass Locale.ROOT.
> >>
> >
> > Ok, I've changed it accordingly. Do you think it actually matters for the
> > case at hand?
> >
> >> On Jan 4, 2022, at 9:13 AM, Gunnar Morling
> >> <gu...@googlemail.com.INVALID> wrote:
> >>>
> >>> Am Di., 4. Jan. 2022 um 09:39 Uhr schrieb Julian Hyde <
> >>> jhyde.apache@gmail.com>:
> >>>
> >>>> Please log a jira case for the commons-lang3 change.
> >>>
> >>>
> >>> Logged https://issues.apache.org/jira/browse/CALCITE-4975.
> >>>
> >>>
> >>>> It looks good. One or two places I’d create a function rather than
> >> having
> >>>> a blob of code inline.
> >>>>
> >>>
> >>> Sure; just let me know where exactly.
> >>>
> >>> Your use of default locale in the CSV adapter looks wrong. Calcite is a
> >>>> server, so never uses default locale or time zone. In fact we use
> >>>> forbiddenApis to check, so we should add a few methods to its
> >> configuration.
> >>>
> >>>
> >>> Yeah, I had been pondering about this; I don't think it matters, the
> >> locale
> >>> should not make any difference for these specific formats, as they
> don't
> >>> contain any locale-specific patterns (unlike, say, "MMM"). I've changed
> >> it
> >>> to Locale.ENGLISH now, just in case. In fact, I wanted to use the
> >>> ofPattern() method without the Locale parameter, but this failed the
> >>> forbiddenApis check as well :)
> >>>
> >>> Julian
> >>>
> >>>
> >>> Best,
> >>>
> >>> --Gunnar
> >>>
> >>>
> >>>>
> >>>>
> >>>>> On Jan 3, 2022, at 12:30 PM, Gunnar Morling
> >>>> <gu...@googlemail.com.invalid> wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> Thanks a lot for this, I think trimming down the dependencies of
> >> Calcite
> >>>>> will be of great help for its adoption.
> >>>>>
> >>>>>> So, the easiest way to reduce dependencies would be to make certain
> >>>>> classes of SQL functions optional (i.e. move them out of core).
> >>>>>
> >>>>> That sounds like a good idea.
> >>>>>
> >>>>>> commons-lang3, commons-codec, commons-io are probably only used in
> one
> >>>> or
> >>>>> two places each;
> >>>>>
> >>>>> To make some progress there, I've created PR
> >>>>> https://github.com/apache/calcite/pull/2672 which removes the
> >>>> dependency to
> >>>>> commons-lang3 from the entire code base. Any feedback on that PR
> would
> >>>>> be appreciated (I still need to log an issue, but wanted to share
> >> quickly
> >>>>> what I had). I can try and take a look at the other ones, if there's
> >>>>> interest in this.
> >>>>>
> >>>>> Re Janino, is there any reason for not using the compiler
> >> implementation
> >>>>> coming with the JDK? Alternatively, one could also consider to
> generate
> >>>>> byte code directly using ASM, which wouldn't be beneficial
> >>>> dependency-wise,
> >>>>> but it may improve the performance of this generation step (I still
> >> lack
> >>>>> insight why this is done in the first place).
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> --Gunnar
> >>>>>
> >>>>>> Am Fr., 31. Dez. 2021 um 00:56 Uhr schrieb Julian Hyde <
> >>>>>> jhyde.apache@gmail.com>:
> >>>>>>
> >>>>>> Regarding dependencies. Here are the runtime dependencies from
> >>>>>> core/build.gradle.kts (ignoring test and annotation libraries):
> >>>>>>
> >>>>>> * api("com.esri.geometry:esri-geometry-api")
> >>>>>> * api("com.fasterxml.jackson.core:jackson-annotations")
> >>>>>> * api("com.google.guava:guava")
> >>>>>> * api("org.apache.calcite.avatica:avatica-core")
> >>>>>> * api("org.slf4j:slf4j-api")
> >>>>>> * implementation("com.fasterxml.jackson.core:jackson-core")
> >>>>>> * implementation("com.fasterxml.jackson.core:jackson-databind")
> >>>>>> *
> >>>>>>
> >>>>
> >>
> implementation("com.fasterxml.jackson.dataformat:jackson-dataformat-yaml")
> >>>>>> * implementation("com.google.uzaygezen:uzaygezen-core")
> >>>>>> * implementation("com.jayway.jsonpath:json-path")
> >>>>>> * implementation("com.yahoo.datasketches:sketches-core")
> >>>>>> * implementation("commons-codec:commons-codec")
> >>>>>> * implementation("net.hydromatic:aggdesigner-algorithm")
> >>>>>> * implementation("org.apache.commons:commons-dbcp2")
> >>>>>> * implementation("org.apache.commons:commons-lang3")
> >>>>>> * implementation("commons-io:commons-io")
> >>>>>> * implementation("org.codehaus.janino:commons-compiler")
> >>>>>> * implementation("org.codehaus.janino:janino")
> >>>>>>
> >>>>>> A few libraries are used only for a narrow range of functionality:
> >>>>>> * esri-geometry and uzaygezen-core are used by geospatial functions;
> >>>>>> * sketches-core is used by the HLL aggregate functions;
> >>>>>> * json-path is used by some JSON functions;
> >>>>>> * jackson-core, jackson-databind, jackson-dataformat-yaml are used
> to
> >>>>>> load models, and to serialize RelNodes to and from JSON;
> >>>>>> * commons-lang3, commons-codec, commons-io are probably only used in
> >> one
> >>>>>> or two places each;
> >>>>>> * aggdesigner-algotihm is used for recommending materialized views.
> >>>>>>
> >>>>>> So, the easiest way to reduce dependencies would be to make certain
> >>>>>> classes of SQL functions optional (i.e. move them out of core).
> >>>>>>
> >>>>>> Julian
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>>> On Dec 29, 2021, at 1:30 PM, Jacques Nadeau <ja...@apache.org>
> >>>> wrote:
> >>>>>>>
> >>>>>>> WRT SBOM (Julian): My general experience is that most large orgs
> use
> >>>>>>> scanners now (either open or closed) and they will scan whether you
> >>>> have
> >>>>>> a
> >>>>>>> bill of materials or not. I wouldn't worry about adding something
> >>>>>>> additional.
> >>>>>>>
> >>>>>>> WRT too many dependencies (Gunnar): I completely agree with the
> >> general
> >>>>>>> feeling of too many (and with Guava, jackson less so). I think the
> >> core
> >>>>>>> challenge (no pun intended) is that calcite-core is really a lot of
> >>>>>>> different components. For example, I have frequently wished that
> >>>> parser,
> >>>>>>> planner and enumerable were separate modules. And if they were, I'd
> >>>> guess
> >>>>>>> that each would have a narrower dependency range. I've also wished
> >> many
> >>>>>>> times that runtime compilation was an optional addon as opposed to
> >>>>>>> required/coupled in the core...
> >>>>>>>
> >>>>>>> When I've thought about how to dissect in the past, I think the big
> >>>>>>> challenge would be tests, where things are sometimes mixed
> together.
> >>>>>>> Breaking change possibilities could be at least somewhat mitigated
> by
> >>>>>>> moving classes but not packages.
> >>>>>>>
> >>>>>>> On Wed, Dec 29, 2021 at 1:51 AM Gunnar Morling
> >>>>>>> <gu...@googlemail.com.invalid> wrote:
> >>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> In a way, Calcite's build configuration as well as the published
> POM
> >>>>>> could
> >>>>>>>> be considered as such an SBOM? In particular when looking at the
> >>>> latter
> >>>>>>>> through services like mvnrepository [1], you get quite a good view
> >> on
> >>>>>> the
> >>>>>>>> dependency versions, licenses, any potential CVEs, etc. I think
> this
> >>>>>> should
> >>>>>>>> satisfy most user needs around this? Or are you referring to the
> >>>> notion
> >>>>>> of
> >>>>>>>> Maven BOM POMs specifically [2], i.e. the notion of publishing a
> POM
> >>>>>> with
> >>>>>>>> all the Calcite component versions which people can then use with
> >>>>>> Maven's
> >>>>>>>> import scope (there should be something comparable for Gradle)? If
> >> so,
> >>>>>> that
> >>>>>>>> could be useful for users working with multiple Calcite
> components,
> >>>>>> though
> >>>>>>>> I think the usability improvement provided by such BOM POM
> wouldn't
> >> be
> >>>>>>>> huge.
> >>>>>>>>
> >>>>>>>> I wanted to bring up a related matter though. Coming to Calcite
> as a
> >>>>>> user
> >>>>>>>> just recently (loving the possibilities it provides!), I was
> >> surprised
> >>>>>> by
> >>>>>>>> the large number of dependencies of the project. It looks like
> 1.29
> >>>>>>>> improves that a little bit (no more kotlin-stdlib, no more
> >> transitive
> >>>>>>>> dependency to log4j 1.x), but the transitive hull of all
> >> dependencies
> >>>> of
> >>>>>>>> calcite-core still is quite big. I lack insight about what the
> >>>> different
> >>>>>>>> dependencies are used for; but as an application developer, Guava
> >> for
> >>>>>>>> instance is a dependency which I'd prefer to not get pushed onto
> the
> >>>>>>>> classpath transitively. Jackson is another heavy one; depending on
> >> how
> >>>>>> it's
> >>>>>>>> used, perhaps this could be pushed into some separate module which
> >>>> users
> >>>>>>>> could optionally  pull in? That'd help to avoid having it around
> >> when
> >>>>>> users
> >>>>>>>> work with other JSON libs themselves and don't require JSON
> support
> >> in
> >>>>>>>> Calcite.
> >>>>>>>>
> >>>>>>>> From a supply chain perspective, the less transitive dependencies
> a
> >>>>>> library
> >>>>>>>> like Calcite introduces to my project, the better IMHO. Less
> >> potential
> >>>>>> for
> >>>>>>>> version conflicts with my own (or other transitive) dependencies,
> >> and
> >>>>>> also
> >>>>>>>> less potential for introducing CVEs to the dependency graph, as
> e.g.
> >>>> in
> >>>>>> the
> >>>>>>>> case of the Guava version currently used by Calcite; I suppose it
> >> does
> >>>>>> not
> >>>>>>>> impact the usage in Calcite, but these things tend to be tricky to
> >>>>>> reason
> >>>>>>>> about, and typical CVE reporting tooling will now create a warning
> >>>> for a
> >>>>>>>> project using Calcite, no matter whether that specific issue
> >> actually
> >>>>>> is a
> >>>>>>>> problem or not.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>>
> >>>>>>>> --Gunnar
> >>>>>>>>
> >>>>>>>> [1]
> >>>>>>>>
> >>>>>>
> >>>>
> >>
> https://mvnrepository.com/artifact/org.apache.calcite/calcite-core/1.29.0
> >>>>>>>> [2]
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>
> https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#bill-of-materials-bom-poms
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Am Mi., 29. Dez. 2021 um 02:27 Uhr schrieb Julian Hyde <
> >>>>>>>> jhyde.apache@gmail.com>:
> >>>>>>>>
> >>>>>>>>> In the wake of the log4j CVEs [1], people are asking how to
> improve
> >>>> the
> >>>>>>>>> security of open source projects, and one idea is to provide a
> SBOM
> >>>>>>>>> (Software Bill of Materials) [2] along with each release.
> >>>>>>>>>
> >>>>>>>>> I had not heard of SBOM until a couple of days ago. Is anyone on
> >> this
> >>>>>>>> list
> >>>>>>>>> familiar with SBOMs and their use? Should Calcite be providing an
> >>>> SBOM?
> >>>>>>>> Are
> >>>>>>>>> people aware of SBOM initiatives in other projects? What, in your
> >>>>>>>> opinion,
> >>>>>>>>> is the priority of this issue?
> >>>>>>>>>
> >>>>>>>>> Julian
> >>>>>>>>>
> >>>>>>>>> [1]
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>
> https://thehackernews.com/2021/12/second-log4j-vulnerability-cve-2021.html
> >>>>>>>>>
> >>>>>>>>> [2] https://en.wikipedia.org/wiki/Software_bill_of_materials
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>
> >>
>
>

Re: [DISCUSS] SBOM (Software Bill of Materials)

Posted by Julian Hyde <jh...@gmail.com>.
No, I don’t think it matters in this case. But consistent use of ROOT is useful, because someone in future might be tracking down a bug, and if they see ENGLISH it’s one more hypothesis they’d have to discount.

> On Jan 4, 2022, at 12:31 PM, Gunnar Morling <gu...@googlemail.com.INVALID> wrote:
> 
> Am Di., 4. Jan. 2022 um 20:51 Uhr schrieb Julian Hyde <
> jhyde.apache@gmail.com>:
> 
>> If a method needs a locale, always pass Locale.ROOT.
>> 
> 
> Ok, I've changed it accordingly. Do you think it actually matters for the
> case at hand?
> 
>> On Jan 4, 2022, at 9:13 AM, Gunnar Morling
>> <gu...@googlemail.com.INVALID> wrote:
>>> 
>>> Am Di., 4. Jan. 2022 um 09:39 Uhr schrieb Julian Hyde <
>>> jhyde.apache@gmail.com>:
>>> 
>>>> Please log a jira case for the commons-lang3 change.
>>> 
>>> 
>>> Logged https://issues.apache.org/jira/browse/CALCITE-4975.
>>> 
>>> 
>>>> It looks good. One or two places I’d create a function rather than
>> having
>>>> a blob of code inline.
>>>> 
>>> 
>>> Sure; just let me know where exactly.
>>> 
>>> Your use of default locale in the CSV adapter looks wrong. Calcite is a
>>>> server, so never uses default locale or time zone. In fact we use
>>>> forbiddenApis to check, so we should add a few methods to its
>> configuration.
>>> 
>>> 
>>> Yeah, I had been pondering about this; I don't think it matters, the
>> locale
>>> should not make any difference for these specific formats, as they don't
>>> contain any locale-specific patterns (unlike, say, "MMM"). I've changed
>> it
>>> to Locale.ENGLISH now, just in case. In fact, I wanted to use the
>>> ofPattern() method without the Locale parameter, but this failed the
>>> forbiddenApis check as well :)
>>> 
>>> Julian
>>> 
>>> 
>>> Best,
>>> 
>>> --Gunnar
>>> 
>>> 
>>>> 
>>>> 
>>>>> On Jan 3, 2022, at 12:30 PM, Gunnar Morling
>>>> <gu...@googlemail.com.invalid> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> Thanks a lot for this, I think trimming down the dependencies of
>> Calcite
>>>>> will be of great help for its adoption.
>>>>> 
>>>>>> So, the easiest way to reduce dependencies would be to make certain
>>>>> classes of SQL functions optional (i.e. move them out of core).
>>>>> 
>>>>> That sounds like a good idea.
>>>>> 
>>>>>> commons-lang3, commons-codec, commons-io are probably only used in one
>>>> or
>>>>> two places each;
>>>>> 
>>>>> To make some progress there, I've created PR
>>>>> https://github.com/apache/calcite/pull/2672 which removes the
>>>> dependency to
>>>>> commons-lang3 from the entire code base. Any feedback on that PR would
>>>>> be appreciated (I still need to log an issue, but wanted to share
>> quickly
>>>>> what I had). I can try and take a look at the other ones, if there's
>>>>> interest in this.
>>>>> 
>>>>> Re Janino, is there any reason for not using the compiler
>> implementation
>>>>> coming with the JDK? Alternatively, one could also consider to generate
>>>>> byte code directly using ASM, which wouldn't be beneficial
>>>> dependency-wise,
>>>>> but it may improve the performance of this generation step (I still
>> lack
>>>>> insight why this is done in the first place).
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> --Gunnar
>>>>> 
>>>>>> Am Fr., 31. Dez. 2021 um 00:56 Uhr schrieb Julian Hyde <
>>>>>> jhyde.apache@gmail.com>:
>>>>>> 
>>>>>> Regarding dependencies. Here are the runtime dependencies from
>>>>>> core/build.gradle.kts (ignoring test and annotation libraries):
>>>>>> 
>>>>>> * api("com.esri.geometry:esri-geometry-api")
>>>>>> * api("com.fasterxml.jackson.core:jackson-annotations")
>>>>>> * api("com.google.guava:guava")
>>>>>> * api("org.apache.calcite.avatica:avatica-core")
>>>>>> * api("org.slf4j:slf4j-api")
>>>>>> * implementation("com.fasterxml.jackson.core:jackson-core")
>>>>>> * implementation("com.fasterxml.jackson.core:jackson-databind")
>>>>>> *
>>>>>> 
>>>> 
>> implementation("com.fasterxml.jackson.dataformat:jackson-dataformat-yaml")
>>>>>> * implementation("com.google.uzaygezen:uzaygezen-core")
>>>>>> * implementation("com.jayway.jsonpath:json-path")
>>>>>> * implementation("com.yahoo.datasketches:sketches-core")
>>>>>> * implementation("commons-codec:commons-codec")
>>>>>> * implementation("net.hydromatic:aggdesigner-algorithm")
>>>>>> * implementation("org.apache.commons:commons-dbcp2")
>>>>>> * implementation("org.apache.commons:commons-lang3")
>>>>>> * implementation("commons-io:commons-io")
>>>>>> * implementation("org.codehaus.janino:commons-compiler")
>>>>>> * implementation("org.codehaus.janino:janino")
>>>>>> 
>>>>>> A few libraries are used only for a narrow range of functionality:
>>>>>> * esri-geometry and uzaygezen-core are used by geospatial functions;
>>>>>> * sketches-core is used by the HLL aggregate functions;
>>>>>> * json-path is used by some JSON functions;
>>>>>> * jackson-core, jackson-databind, jackson-dataformat-yaml are used to
>>>>>> load models, and to serialize RelNodes to and from JSON;
>>>>>> * commons-lang3, commons-codec, commons-io are probably only used in
>> one
>>>>>> or two places each;
>>>>>> * aggdesigner-algotihm is used for recommending materialized views.
>>>>>> 
>>>>>> So, the easiest way to reduce dependencies would be to make certain
>>>>>> classes of SQL functions optional (i.e. move them out of core).
>>>>>> 
>>>>>> Julian
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>>> On Dec 29, 2021, at 1:30 PM, Jacques Nadeau <ja...@apache.org>
>>>> wrote:
>>>>>>> 
>>>>>>> WRT SBOM (Julian): My general experience is that most large orgs use
>>>>>>> scanners now (either open or closed) and they will scan whether you
>>>> have
>>>>>> a
>>>>>>> bill of materials or not. I wouldn't worry about adding something
>>>>>>> additional.
>>>>>>> 
>>>>>>> WRT too many dependencies (Gunnar): I completely agree with the
>> general
>>>>>>> feeling of too many (and with Guava, jackson less so). I think the
>> core
>>>>>>> challenge (no pun intended) is that calcite-core is really a lot of
>>>>>>> different components. For example, I have frequently wished that
>>>> parser,
>>>>>>> planner and enumerable were separate modules. And if they were, I'd
>>>> guess
>>>>>>> that each would have a narrower dependency range. I've also wished
>> many
>>>>>>> times that runtime compilation was an optional addon as opposed to
>>>>>>> required/coupled in the core...
>>>>>>> 
>>>>>>> When I've thought about how to dissect in the past, I think the big
>>>>>>> challenge would be tests, where things are sometimes mixed together.
>>>>>>> Breaking change possibilities could be at least somewhat mitigated by
>>>>>>> moving classes but not packages.
>>>>>>> 
>>>>>>> On Wed, Dec 29, 2021 at 1:51 AM Gunnar Morling
>>>>>>> <gu...@googlemail.com.invalid> wrote:
>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> In a way, Calcite's build configuration as well as the published POM
>>>>>> could
>>>>>>>> be considered as such an SBOM? In particular when looking at the
>>>> latter
>>>>>>>> through services like mvnrepository [1], you get quite a good view
>> on
>>>>>> the
>>>>>>>> dependency versions, licenses, any potential CVEs, etc. I think this
>>>>>> should
>>>>>>>> satisfy most user needs around this? Or are you referring to the
>>>> notion
>>>>>> of
>>>>>>>> Maven BOM POMs specifically [2], i.e. the notion of publishing a POM
>>>>>> with
>>>>>>>> all the Calcite component versions which people can then use with
>>>>>> Maven's
>>>>>>>> import scope (there should be something comparable for Gradle)? If
>> so,
>>>>>> that
>>>>>>>> could be useful for users working with multiple Calcite components,
>>>>>> though
>>>>>>>> I think the usability improvement provided by such BOM POM wouldn't
>> be
>>>>>>>> huge.
>>>>>>>> 
>>>>>>>> I wanted to bring up a related matter though. Coming to Calcite as a
>>>>>> user
>>>>>>>> just recently (loving the possibilities it provides!), I was
>> surprised
>>>>>> by
>>>>>>>> the large number of dependencies of the project. It looks like 1.29
>>>>>>>> improves that a little bit (no more kotlin-stdlib, no more
>> transitive
>>>>>>>> dependency to log4j 1.x), but the transitive hull of all
>> dependencies
>>>> of
>>>>>>>> calcite-core still is quite big. I lack insight about what the
>>>> different
>>>>>>>> dependencies are used for; but as an application developer, Guava
>> for
>>>>>>>> instance is a dependency which I'd prefer to not get pushed onto the
>>>>>>>> classpath transitively. Jackson is another heavy one; depending on
>> how
>>>>>> it's
>>>>>>>> used, perhaps this could be pushed into some separate module which
>>>> users
>>>>>>>> could optionally  pull in? That'd help to avoid having it around
>> when
>>>>>> users
>>>>>>>> work with other JSON libs themselves and don't require JSON support
>> in
>>>>>>>> Calcite.
>>>>>>>> 
>>>>>>>> From a supply chain perspective, the less transitive dependencies a
>>>>>> library
>>>>>>>> like Calcite introduces to my project, the better IMHO. Less
>> potential
>>>>>> for
>>>>>>>> version conflicts with my own (or other transitive) dependencies,
>> and
>>>>>> also
>>>>>>>> less potential for introducing CVEs to the dependency graph, as e.g.
>>>> in
>>>>>> the
>>>>>>>> case of the Guava version currently used by Calcite; I suppose it
>> does
>>>>>> not
>>>>>>>> impact the usage in Calcite, but these things tend to be tricky to
>>>>>> reason
>>>>>>>> about, and typical CVE reporting tooling will now create a warning
>>>> for a
>>>>>>>> project using Calcite, no matter whether that specific issue
>> actually
>>>>>> is a
>>>>>>>> problem or not.
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> 
>>>>>>>> --Gunnar
>>>>>>>> 
>>>>>>>> [1]
>>>>>>>> 
>>>>>> 
>>>> 
>> https://mvnrepository.com/artifact/org.apache.calcite/calcite-core/1.29.0
>>>>>>>> [2]
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#bill-of-materials-bom-poms
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Am Mi., 29. Dez. 2021 um 02:27 Uhr schrieb Julian Hyde <
>>>>>>>> jhyde.apache@gmail.com>:
>>>>>>>> 
>>>>>>>>> In the wake of the log4j CVEs [1], people are asking how to improve
>>>> the
>>>>>>>>> security of open source projects, and one idea is to provide a SBOM
>>>>>>>>> (Software Bill of Materials) [2] along with each release.
>>>>>>>>> 
>>>>>>>>> I had not heard of SBOM until a couple of days ago. Is anyone on
>> this
>>>>>>>> list
>>>>>>>>> familiar with SBOMs and their use? Should Calcite be providing an
>>>> SBOM?
>>>>>>>> Are
>>>>>>>>> people aware of SBOM initiatives in other projects? What, in your
>>>>>>>> opinion,
>>>>>>>>> is the priority of this issue?
>>>>>>>>> 
>>>>>>>>> Julian
>>>>>>>>> 
>>>>>>>>> [1]
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> https://thehackernews.com/2021/12/second-log4j-vulnerability-cve-2021.html
>>>>>>>>> 
>>>>>>>>> [2] https://en.wikipedia.org/wiki/Software_bill_of_materials
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>> 
>> 


Re: [DISCUSS] SBOM (Software Bill of Materials)

Posted by Gunnar Morling <gu...@googlemail.com.INVALID>.
Am Di., 4. Jan. 2022 um 20:51 Uhr schrieb Julian Hyde <
jhyde.apache@gmail.com>:

> If a method needs a locale, always pass Locale.ROOT.
>

Ok, I've changed it accordingly. Do you think it actually matters for the
case at hand?

> On Jan 4, 2022, at 9:13 AM, Gunnar Morling
> <gu...@googlemail.com.INVALID> wrote:
> >
> > Am Di., 4. Jan. 2022 um 09:39 Uhr schrieb Julian Hyde <
> > jhyde.apache@gmail.com>:
> >
> >> Please log a jira case for the commons-lang3 change.
> >
> >
> > Logged https://issues.apache.org/jira/browse/CALCITE-4975.
> >
> >
> >> It looks good. One or two places I’d create a function rather than
> having
> >> a blob of code inline.
> >>
> >
> > Sure; just let me know where exactly.
> >
> > Your use of default locale in the CSV adapter looks wrong. Calcite is a
> >> server, so never uses default locale or time zone. In fact we use
> >> forbiddenApis to check, so we should add a few methods to its
> configuration.
> >
> >
> > Yeah, I had been pondering about this; I don't think it matters, the
> locale
> > should not make any difference for these specific formats, as they don't
> > contain any locale-specific patterns (unlike, say, "MMM"). I've changed
> it
> > to Locale.ENGLISH now, just in case. In fact, I wanted to use the
> > ofPattern() method without the Locale parameter, but this failed the
> > forbiddenApis check as well :)
> >
> > Julian
> >
> >
> > Best,
> >
> > --Gunnar
> >
> >
> >>
> >>
> >>> On Jan 3, 2022, at 12:30 PM, Gunnar Morling
> >> <gu...@googlemail.com.invalid> wrote:
> >>>
> >>> Hi,
> >>>
> >>> Thanks a lot for this, I think trimming down the dependencies of
> Calcite
> >>> will be of great help for its adoption.
> >>>
> >>>> So, the easiest way to reduce dependencies would be to make certain
> >>> classes of SQL functions optional (i.e. move them out of core).
> >>>
> >>> That sounds like a good idea.
> >>>
> >>>> commons-lang3, commons-codec, commons-io are probably only used in one
> >> or
> >>> two places each;
> >>>
> >>> To make some progress there, I've created PR
> >>> https://github.com/apache/calcite/pull/2672 which removes the
> >> dependency to
> >>> commons-lang3 from the entire code base. Any feedback on that PR would
> >>> be appreciated (I still need to log an issue, but wanted to share
> quickly
> >>> what I had). I can try and take a look at the other ones, if there's
> >>> interest in this.
> >>>
> >>> Re Janino, is there any reason for not using the compiler
> implementation
> >>> coming with the JDK? Alternatively, one could also consider to generate
> >>> byte code directly using ASM, which wouldn't be beneficial
> >> dependency-wise,
> >>> but it may improve the performance of this generation step (I still
> lack
> >>> insight why this is done in the first place).
> >>>
> >>> Thanks,
> >>>
> >>> --Gunnar
> >>>
> >>>> Am Fr., 31. Dez. 2021 um 00:56 Uhr schrieb Julian Hyde <
> >>>> jhyde.apache@gmail.com>:
> >>>>
> >>>> Regarding dependencies. Here are the runtime dependencies from
> >>>> core/build.gradle.kts (ignoring test and annotation libraries):
> >>>>
> >>>> * api("com.esri.geometry:esri-geometry-api")
> >>>> * api("com.fasterxml.jackson.core:jackson-annotations")
> >>>> * api("com.google.guava:guava")
> >>>> * api("org.apache.calcite.avatica:avatica-core")
> >>>> * api("org.slf4j:slf4j-api")
> >>>> * implementation("com.fasterxml.jackson.core:jackson-core")
> >>>> * implementation("com.fasterxml.jackson.core:jackson-databind")
> >>>> *
> >>>>
> >>
> implementation("com.fasterxml.jackson.dataformat:jackson-dataformat-yaml")
> >>>> * implementation("com.google.uzaygezen:uzaygezen-core")
> >>>> * implementation("com.jayway.jsonpath:json-path")
> >>>> * implementation("com.yahoo.datasketches:sketches-core")
> >>>> * implementation("commons-codec:commons-codec")
> >>>> * implementation("net.hydromatic:aggdesigner-algorithm")
> >>>> * implementation("org.apache.commons:commons-dbcp2")
> >>>> * implementation("org.apache.commons:commons-lang3")
> >>>> * implementation("commons-io:commons-io")
> >>>> * implementation("org.codehaus.janino:commons-compiler")
> >>>> * implementation("org.codehaus.janino:janino")
> >>>>
> >>>> A few libraries are used only for a narrow range of functionality:
> >>>> * esri-geometry and uzaygezen-core are used by geospatial functions;
> >>>> * sketches-core is used by the HLL aggregate functions;
> >>>> * json-path is used by some JSON functions;
> >>>> * jackson-core, jackson-databind, jackson-dataformat-yaml are used to
> >>>> load models, and to serialize RelNodes to and from JSON;
> >>>> * commons-lang3, commons-codec, commons-io are probably only used in
> one
> >>>> or two places each;
> >>>> * aggdesigner-algotihm is used for recommending materialized views.
> >>>>
> >>>> So, the easiest way to reduce dependencies would be to make certain
> >>>> classes of SQL functions optional (i.e. move them out of core).
> >>>>
> >>>> Julian
> >>>>
> >>>>
> >>>>
> >>>>>> On Dec 29, 2021, at 1:30 PM, Jacques Nadeau <ja...@apache.org>
> >> wrote:
> >>>>>
> >>>>> WRT SBOM (Julian): My general experience is that most large orgs use
> >>>>> scanners now (either open or closed) and they will scan whether you
> >> have
> >>>> a
> >>>>> bill of materials or not. I wouldn't worry about adding something
> >>>>> additional.
> >>>>>
> >>>>> WRT too many dependencies (Gunnar): I completely agree with the
> general
> >>>>> feeling of too many (and with Guava, jackson less so). I think the
> core
> >>>>> challenge (no pun intended) is that calcite-core is really a lot of
> >>>>> different components. For example, I have frequently wished that
> >> parser,
> >>>>> planner and enumerable were separate modules. And if they were, I'd
> >> guess
> >>>>> that each would have a narrower dependency range. I've also wished
> many
> >>>>> times that runtime compilation was an optional addon as opposed to
> >>>>> required/coupled in the core...
> >>>>>
> >>>>> When I've thought about how to dissect in the past, I think the big
> >>>>> challenge would be tests, where things are sometimes mixed together.
> >>>>> Breaking change possibilities could be at least somewhat mitigated by
> >>>>> moving classes but not packages.
> >>>>>
> >>>>> On Wed, Dec 29, 2021 at 1:51 AM Gunnar Morling
> >>>>> <gu...@googlemail.com.invalid> wrote:
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> In a way, Calcite's build configuration as well as the published POM
> >>>> could
> >>>>>> be considered as such an SBOM? In particular when looking at the
> >> latter
> >>>>>> through services like mvnrepository [1], you get quite a good view
> on
> >>>> the
> >>>>>> dependency versions, licenses, any potential CVEs, etc. I think this
> >>>> should
> >>>>>> satisfy most user needs around this? Or are you referring to the
> >> notion
> >>>> of
> >>>>>> Maven BOM POMs specifically [2], i.e. the notion of publishing a POM
> >>>> with
> >>>>>> all the Calcite component versions which people can then use with
> >>>> Maven's
> >>>>>> import scope (there should be something comparable for Gradle)? If
> so,
> >>>> that
> >>>>>> could be useful for users working with multiple Calcite components,
> >>>> though
> >>>>>> I think the usability improvement provided by such BOM POM wouldn't
> be
> >>>>>> huge.
> >>>>>>
> >>>>>> I wanted to bring up a related matter though. Coming to Calcite as a
> >>>> user
> >>>>>> just recently (loving the possibilities it provides!), I was
> surprised
> >>>> by
> >>>>>> the large number of dependencies of the project. It looks like 1.29
> >>>>>> improves that a little bit (no more kotlin-stdlib, no more
> transitive
> >>>>>> dependency to log4j 1.x), but the transitive hull of all
> dependencies
> >> of
> >>>>>> calcite-core still is quite big. I lack insight about what the
> >> different
> >>>>>> dependencies are used for; but as an application developer, Guava
> for
> >>>>>> instance is a dependency which I'd prefer to not get pushed onto the
> >>>>>> classpath transitively. Jackson is another heavy one; depending on
> how
> >>>> it's
> >>>>>> used, perhaps this could be pushed into some separate module which
> >> users
> >>>>>> could optionally  pull in? That'd help to avoid having it around
> when
> >>>> users
> >>>>>> work with other JSON libs themselves and don't require JSON support
> in
> >>>>>> Calcite.
> >>>>>>
> >>>>>> From a supply chain perspective, the less transitive dependencies a
> >>>> library
> >>>>>> like Calcite introduces to my project, the better IMHO. Less
> potential
> >>>> for
> >>>>>> version conflicts with my own (or other transitive) dependencies,
> and
> >>>> also
> >>>>>> less potential for introducing CVEs to the dependency graph, as e.g.
> >> in
> >>>> the
> >>>>>> case of the Guava version currently used by Calcite; I suppose it
> does
> >>>> not
> >>>>>> impact the usage in Calcite, but these things tend to be tricky to
> >>>> reason
> >>>>>> about, and typical CVE reporting tooling will now create a warning
> >> for a
> >>>>>> project using Calcite, no matter whether that specific issue
> actually
> >>>> is a
> >>>>>> problem or not.
> >>>>>>
> >>>>>> Best,
> >>>>>>
> >>>>>> --Gunnar
> >>>>>>
> >>>>>> [1]
> >>>>>>
> >>>>
> >>
> https://mvnrepository.com/artifact/org.apache.calcite/calcite-core/1.29.0
> >>>>>> [2]
> >>>>>>
> >>>>>>
> >>>>
> >>
> https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#bill-of-materials-bom-poms
> >>>>>>
> >>>>>>
> >>>>>> Am Mi., 29. Dez. 2021 um 02:27 Uhr schrieb Julian Hyde <
> >>>>>> jhyde.apache@gmail.com>:
> >>>>>>
> >>>>>>> In the wake of the log4j CVEs [1], people are asking how to improve
> >> the
> >>>>>>> security of open source projects, and one idea is to provide a SBOM
> >>>>>>> (Software Bill of Materials) [2] along with each release.
> >>>>>>>
> >>>>>>> I had not heard of SBOM until a couple of days ago. Is anyone on
> this
> >>>>>> list
> >>>>>>> familiar with SBOMs and their use? Should Calcite be providing an
> >> SBOM?
> >>>>>> Are
> >>>>>>> people aware of SBOM initiatives in other projects? What, in your
> >>>>>> opinion,
> >>>>>>> is the priority of this issue?
> >>>>>>>
> >>>>>>> Julian
> >>>>>>>
> >>>>>>> [1]
> >>>>>>>
> >>>>>>
> >>>>
> >>
> https://thehackernews.com/2021/12/second-log4j-vulnerability-cve-2021.html
> >>>>>>>
> >>>>>>> [2] https://en.wikipedia.org/wiki/Software_bill_of_materials
> >>>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
>
>

Re: [DISCUSS] SBOM (Software Bill of Materials)

Posted by Julian Hyde <jh...@gmail.com>.
If a method needs a locale, always pass Locale.ROOT.

> On Jan 4, 2022, at 9:13 AM, Gunnar Morling <gu...@googlemail.com.INVALID> wrote:
> 
> Am Di., 4. Jan. 2022 um 09:39 Uhr schrieb Julian Hyde <
> jhyde.apache@gmail.com>:
> 
>> Please log a jira case for the commons-lang3 change.
> 
> 
> Logged https://issues.apache.org/jira/browse/CALCITE-4975.
> 
> 
>> It looks good. One or two places I’d create a function rather than having
>> a blob of code inline.
>> 
> 
> Sure; just let me know where exactly.
> 
> Your use of default locale in the CSV adapter looks wrong. Calcite is a
>> server, so never uses default locale or time zone. In fact we use
>> forbiddenApis to check, so we should add a few methods to its configuration.
> 
> 
> Yeah, I had been pondering about this; I don't think it matters, the locale
> should not make any difference for these specific formats, as they don't
> contain any locale-specific patterns (unlike, say, "MMM"). I've changed it
> to Locale.ENGLISH now, just in case. In fact, I wanted to use the
> ofPattern() method without the Locale parameter, but this failed the
> forbiddenApis check as well :)
> 
> Julian
> 
> 
> Best,
> 
> --Gunnar
> 
> 
>> 
>> 
>>> On Jan 3, 2022, at 12:30 PM, Gunnar Morling
>> <gu...@googlemail.com.invalid> wrote:
>>> 
>>> Hi,
>>> 
>>> Thanks a lot for this, I think trimming down the dependencies of Calcite
>>> will be of great help for its adoption.
>>> 
>>>> So, the easiest way to reduce dependencies would be to make certain
>>> classes of SQL functions optional (i.e. move them out of core).
>>> 
>>> That sounds like a good idea.
>>> 
>>>> commons-lang3, commons-codec, commons-io are probably only used in one
>> or
>>> two places each;
>>> 
>>> To make some progress there, I've created PR
>>> https://github.com/apache/calcite/pull/2672 which removes the
>> dependency to
>>> commons-lang3 from the entire code base. Any feedback on that PR would
>>> be appreciated (I still need to log an issue, but wanted to share quickly
>>> what I had). I can try and take a look at the other ones, if there's
>>> interest in this.
>>> 
>>> Re Janino, is there any reason for not using the compiler implementation
>>> coming with the JDK? Alternatively, one could also consider to generate
>>> byte code directly using ASM, which wouldn't be beneficial
>> dependency-wise,
>>> but it may improve the performance of this generation step (I still lack
>>> insight why this is done in the first place).
>>> 
>>> Thanks,
>>> 
>>> --Gunnar
>>> 
>>>> Am Fr., 31. Dez. 2021 um 00:56 Uhr schrieb Julian Hyde <
>>>> jhyde.apache@gmail.com>:
>>>> 
>>>> Regarding dependencies. Here are the runtime dependencies from
>>>> core/build.gradle.kts (ignoring test and annotation libraries):
>>>> 
>>>> * api("com.esri.geometry:esri-geometry-api")
>>>> * api("com.fasterxml.jackson.core:jackson-annotations")
>>>> * api("com.google.guava:guava")
>>>> * api("org.apache.calcite.avatica:avatica-core")
>>>> * api("org.slf4j:slf4j-api")
>>>> * implementation("com.fasterxml.jackson.core:jackson-core")
>>>> * implementation("com.fasterxml.jackson.core:jackson-databind")
>>>> *
>>>> 
>> implementation("com.fasterxml.jackson.dataformat:jackson-dataformat-yaml")
>>>> * implementation("com.google.uzaygezen:uzaygezen-core")
>>>> * implementation("com.jayway.jsonpath:json-path")
>>>> * implementation("com.yahoo.datasketches:sketches-core")
>>>> * implementation("commons-codec:commons-codec")
>>>> * implementation("net.hydromatic:aggdesigner-algorithm")
>>>> * implementation("org.apache.commons:commons-dbcp2")
>>>> * implementation("org.apache.commons:commons-lang3")
>>>> * implementation("commons-io:commons-io")
>>>> * implementation("org.codehaus.janino:commons-compiler")
>>>> * implementation("org.codehaus.janino:janino")
>>>> 
>>>> A few libraries are used only for a narrow range of functionality:
>>>> * esri-geometry and uzaygezen-core are used by geospatial functions;
>>>> * sketches-core is used by the HLL aggregate functions;
>>>> * json-path is used by some JSON functions;
>>>> * jackson-core, jackson-databind, jackson-dataformat-yaml are used to
>>>> load models, and to serialize RelNodes to and from JSON;
>>>> * commons-lang3, commons-codec, commons-io are probably only used in one
>>>> or two places each;
>>>> * aggdesigner-algotihm is used for recommending materialized views.
>>>> 
>>>> So, the easiest way to reduce dependencies would be to make certain
>>>> classes of SQL functions optional (i.e. move them out of core).
>>>> 
>>>> Julian
>>>> 
>>>> 
>>>> 
>>>>>> On Dec 29, 2021, at 1:30 PM, Jacques Nadeau <ja...@apache.org>
>> wrote:
>>>>> 
>>>>> WRT SBOM (Julian): My general experience is that most large orgs use
>>>>> scanners now (either open or closed) and they will scan whether you
>> have
>>>> a
>>>>> bill of materials or not. I wouldn't worry about adding something
>>>>> additional.
>>>>> 
>>>>> WRT too many dependencies (Gunnar): I completely agree with the general
>>>>> feeling of too many (and with Guava, jackson less so). I think the core
>>>>> challenge (no pun intended) is that calcite-core is really a lot of
>>>>> different components. For example, I have frequently wished that
>> parser,
>>>>> planner and enumerable were separate modules. And if they were, I'd
>> guess
>>>>> that each would have a narrower dependency range. I've also wished many
>>>>> times that runtime compilation was an optional addon as opposed to
>>>>> required/coupled in the core...
>>>>> 
>>>>> When I've thought about how to dissect in the past, I think the big
>>>>> challenge would be tests, where things are sometimes mixed together.
>>>>> Breaking change possibilities could be at least somewhat mitigated by
>>>>> moving classes but not packages.
>>>>> 
>>>>> On Wed, Dec 29, 2021 at 1:51 AM Gunnar Morling
>>>>> <gu...@googlemail.com.invalid> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> In a way, Calcite's build configuration as well as the published POM
>>>> could
>>>>>> be considered as such an SBOM? In particular when looking at the
>> latter
>>>>>> through services like mvnrepository [1], you get quite a good view on
>>>> the
>>>>>> dependency versions, licenses, any potential CVEs, etc. I think this
>>>> should
>>>>>> satisfy most user needs around this? Or are you referring to the
>> notion
>>>> of
>>>>>> Maven BOM POMs specifically [2], i.e. the notion of publishing a POM
>>>> with
>>>>>> all the Calcite component versions which people can then use with
>>>> Maven's
>>>>>> import scope (there should be something comparable for Gradle)? If so,
>>>> that
>>>>>> could be useful for users working with multiple Calcite components,
>>>> though
>>>>>> I think the usability improvement provided by such BOM POM wouldn't be
>>>>>> huge.
>>>>>> 
>>>>>> I wanted to bring up a related matter though. Coming to Calcite as a
>>>> user
>>>>>> just recently (loving the possibilities it provides!), I was surprised
>>>> by
>>>>>> the large number of dependencies of the project. It looks like 1.29
>>>>>> improves that a little bit (no more kotlin-stdlib, no more transitive
>>>>>> dependency to log4j 1.x), but the transitive hull of all dependencies
>> of
>>>>>> calcite-core still is quite big. I lack insight about what the
>> different
>>>>>> dependencies are used for; but as an application developer, Guava for
>>>>>> instance is a dependency which I'd prefer to not get pushed onto the
>>>>>> classpath transitively. Jackson is another heavy one; depending on how
>>>> it's
>>>>>> used, perhaps this could be pushed into some separate module which
>> users
>>>>>> could optionally  pull in? That'd help to avoid having it around when
>>>> users
>>>>>> work with other JSON libs themselves and don't require JSON support in
>>>>>> Calcite.
>>>>>> 
>>>>>> From a supply chain perspective, the less transitive dependencies a
>>>> library
>>>>>> like Calcite introduces to my project, the better IMHO. Less potential
>>>> for
>>>>>> version conflicts with my own (or other transitive) dependencies, and
>>>> also
>>>>>> less potential for introducing CVEs to the dependency graph, as e.g.
>> in
>>>> the
>>>>>> case of the Guava version currently used by Calcite; I suppose it does
>>>> not
>>>>>> impact the usage in Calcite, but these things tend to be tricky to
>>>> reason
>>>>>> about, and typical CVE reporting tooling will now create a warning
>> for a
>>>>>> project using Calcite, no matter whether that specific issue actually
>>>> is a
>>>>>> problem or not.
>>>>>> 
>>>>>> Best,
>>>>>> 
>>>>>> --Gunnar
>>>>>> 
>>>>>> [1]
>>>>>> 
>>>> 
>> https://mvnrepository.com/artifact/org.apache.calcite/calcite-core/1.29.0
>>>>>> [2]
>>>>>> 
>>>>>> 
>>>> 
>> https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#bill-of-materials-bom-poms
>>>>>> 
>>>>>> 
>>>>>> Am Mi., 29. Dez. 2021 um 02:27 Uhr schrieb Julian Hyde <
>>>>>> jhyde.apache@gmail.com>:
>>>>>> 
>>>>>>> In the wake of the log4j CVEs [1], people are asking how to improve
>> the
>>>>>>> security of open source projects, and one idea is to provide a SBOM
>>>>>>> (Software Bill of Materials) [2] along with each release.
>>>>>>> 
>>>>>>> I had not heard of SBOM until a couple of days ago. Is anyone on this
>>>>>> list
>>>>>>> familiar with SBOMs and their use? Should Calcite be providing an
>> SBOM?
>>>>>> Are
>>>>>>> people aware of SBOM initiatives in other projects? What, in your
>>>>>> opinion,
>>>>>>> is the priority of this issue?
>>>>>>> 
>>>>>>> Julian
>>>>>>> 
>>>>>>> [1]
>>>>>>> 
>>>>>> 
>>>> 
>> https://thehackernews.com/2021/12/second-log4j-vulnerability-cve-2021.html
>>>>>>> 
>>>>>>> [2] https://en.wikipedia.org/wiki/Software_bill_of_materials
>>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 


Re: [DISCUSS] SBOM (Software Bill of Materials)

Posted by Gunnar Morling <gu...@googlemail.com.INVALID>.
Am Di., 4. Jan. 2022 um 09:39 Uhr schrieb Julian Hyde <
jhyde.apache@gmail.com>:

> Please log a jira case for the commons-lang3 change.


Logged https://issues.apache.org/jira/browse/CALCITE-4975.


> It looks good. One or two places I’d create a function rather than having
> a blob of code inline.
>

Sure; just let me know where exactly.

Your use of default locale in the CSV adapter looks wrong. Calcite is a
> server, so never uses default locale or time zone. In fact we use
> forbiddenApis to check, so we should add a few methods to its configuration.


Yeah, I had been pondering about this; I don't think it matters, the locale
should not make any difference for these specific formats, as they don't
contain any locale-specific patterns (unlike, say, "MMM"). I've changed it
to Locale.ENGLISH now, just in case. In fact, I wanted to use the
ofPattern() method without the Locale parameter, but this failed the
forbiddenApis check as well :)

Julian


Best,

--Gunnar


>
>
> > On Jan 3, 2022, at 12:30 PM, Gunnar Morling
> <gu...@googlemail.com.invalid> wrote:
> >
> > Hi,
> >
> > Thanks a lot for this, I think trimming down the dependencies of Calcite
> > will be of great help for its adoption.
> >
> >> So, the easiest way to reduce dependencies would be to make certain
> > classes of SQL functions optional (i.e. move them out of core).
> >
> > That sounds like a good idea.
> >
> >> commons-lang3, commons-codec, commons-io are probably only used in one
> or
> > two places each;
> >
> > To make some progress there, I've created PR
> > https://github.com/apache/calcite/pull/2672 which removes the
> dependency to
> > commons-lang3 from the entire code base. Any feedback on that PR would
> > be appreciated (I still need to log an issue, but wanted to share quickly
> > what I had). I can try and take a look at the other ones, if there's
> > interest in this.
> >
> > Re Janino, is there any reason for not using the compiler implementation
> > coming with the JDK? Alternatively, one could also consider to generate
> > byte code directly using ASM, which wouldn't be beneficial
> dependency-wise,
> > but it may improve the performance of this generation step (I still lack
> > insight why this is done in the first place).
> >
> > Thanks,
> >
> > --Gunnar
> >
> >> Am Fr., 31. Dez. 2021 um 00:56 Uhr schrieb Julian Hyde <
> >> jhyde.apache@gmail.com>:
> >>
> >> Regarding dependencies. Here are the runtime dependencies from
> >> core/build.gradle.kts (ignoring test and annotation libraries):
> >>
> >> * api("com.esri.geometry:esri-geometry-api")
> >> * api("com.fasterxml.jackson.core:jackson-annotations")
> >> * api("com.google.guava:guava")
> >> * api("org.apache.calcite.avatica:avatica-core")
> >> * api("org.slf4j:slf4j-api")
> >> * implementation("com.fasterxml.jackson.core:jackson-core")
> >> * implementation("com.fasterxml.jackson.core:jackson-databind")
> >> *
> >>
> implementation("com.fasterxml.jackson.dataformat:jackson-dataformat-yaml")
> >> * implementation("com.google.uzaygezen:uzaygezen-core")
> >> * implementation("com.jayway.jsonpath:json-path")
> >> * implementation("com.yahoo.datasketches:sketches-core")
> >> * implementation("commons-codec:commons-codec")
> >> * implementation("net.hydromatic:aggdesigner-algorithm")
> >> * implementation("org.apache.commons:commons-dbcp2")
> >> * implementation("org.apache.commons:commons-lang3")
> >> * implementation("commons-io:commons-io")
> >> * implementation("org.codehaus.janino:commons-compiler")
> >> * implementation("org.codehaus.janino:janino")
> >>
> >> A few libraries are used only for a narrow range of functionality:
> >> * esri-geometry and uzaygezen-core are used by geospatial functions;
> >> * sketches-core is used by the HLL aggregate functions;
> >> * json-path is used by some JSON functions;
> >> * jackson-core, jackson-databind, jackson-dataformat-yaml are used to
> >> load models, and to serialize RelNodes to and from JSON;
> >> * commons-lang3, commons-codec, commons-io are probably only used in one
> >> or two places each;
> >> * aggdesigner-algotihm is used for recommending materialized views.
> >>
> >> So, the easiest way to reduce dependencies would be to make certain
> >> classes of SQL functions optional (i.e. move them out of core).
> >>
> >> Julian
> >>
> >>
> >>
> >>>> On Dec 29, 2021, at 1:30 PM, Jacques Nadeau <ja...@apache.org>
> wrote:
> >>>
> >>> WRT SBOM (Julian): My general experience is that most large orgs use
> >>> scanners now (either open or closed) and they will scan whether you
> have
> >> a
> >>> bill of materials or not. I wouldn't worry about adding something
> >>> additional.
> >>>
> >>> WRT too many dependencies (Gunnar): I completely agree with the general
> >>> feeling of too many (and with Guava, jackson less so). I think the core
> >>> challenge (no pun intended) is that calcite-core is really a lot of
> >>> different components. For example, I have frequently wished that
> parser,
> >>> planner and enumerable were separate modules. And if they were, I'd
> guess
> >>> that each would have a narrower dependency range. I've also wished many
> >>> times that runtime compilation was an optional addon as opposed to
> >>> required/coupled in the core...
> >>>
> >>> When I've thought about how to dissect in the past, I think the big
> >>> challenge would be tests, where things are sometimes mixed together.
> >>> Breaking change possibilities could be at least somewhat mitigated by
> >>> moving classes but not packages.
> >>>
> >>> On Wed, Dec 29, 2021 at 1:51 AM Gunnar Morling
> >>> <gu...@googlemail.com.invalid> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> In a way, Calcite's build configuration as well as the published POM
> >> could
> >>>> be considered as such an SBOM? In particular when looking at the
> latter
> >>>> through services like mvnrepository [1], you get quite a good view on
> >> the
> >>>> dependency versions, licenses, any potential CVEs, etc. I think this
> >> should
> >>>> satisfy most user needs around this? Or are you referring to the
> notion
> >> of
> >>>> Maven BOM POMs specifically [2], i.e. the notion of publishing a POM
> >> with
> >>>> all the Calcite component versions which people can then use with
> >> Maven's
> >>>> import scope (there should be something comparable for Gradle)? If so,
> >> that
> >>>> could be useful for users working with multiple Calcite components,
> >> though
> >>>> I think the usability improvement provided by such BOM POM wouldn't be
> >>>> huge.
> >>>>
> >>>> I wanted to bring up a related matter though. Coming to Calcite as a
> >> user
> >>>> just recently (loving the possibilities it provides!), I was surprised
> >> by
> >>>> the large number of dependencies of the project. It looks like 1.29
> >>>> improves that a little bit (no more kotlin-stdlib, no more transitive
> >>>> dependency to log4j 1.x), but the transitive hull of all dependencies
> of
> >>>> calcite-core still is quite big. I lack insight about what the
> different
> >>>> dependencies are used for; but as an application developer, Guava for
> >>>> instance is a dependency which I'd prefer to not get pushed onto the
> >>>> classpath transitively. Jackson is another heavy one; depending on how
> >> it's
> >>>> used, perhaps this could be pushed into some separate module which
> users
> >>>> could optionally  pull in? That'd help to avoid having it around when
> >> users
> >>>> work with other JSON libs themselves and don't require JSON support in
> >>>> Calcite.
> >>>>
> >>>> From a supply chain perspective, the less transitive dependencies a
> >> library
> >>>> like Calcite introduces to my project, the better IMHO. Less potential
> >> for
> >>>> version conflicts with my own (or other transitive) dependencies, and
> >> also
> >>>> less potential for introducing CVEs to the dependency graph, as e.g.
> in
> >> the
> >>>> case of the Guava version currently used by Calcite; I suppose it does
> >> not
> >>>> impact the usage in Calcite, but these things tend to be tricky to
> >> reason
> >>>> about, and typical CVE reporting tooling will now create a warning
> for a
> >>>> project using Calcite, no matter whether that specific issue actually
> >> is a
> >>>> problem or not.
> >>>>
> >>>> Best,
> >>>>
> >>>> --Gunnar
> >>>>
> >>>> [1]
> >>>>
> >>
> https://mvnrepository.com/artifact/org.apache.calcite/calcite-core/1.29.0
> >>>> [2]
> >>>>
> >>>>
> >>
> https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#bill-of-materials-bom-poms
> >>>>
> >>>>
> >>>> Am Mi., 29. Dez. 2021 um 02:27 Uhr schrieb Julian Hyde <
> >>>> jhyde.apache@gmail.com>:
> >>>>
> >>>>> In the wake of the log4j CVEs [1], people are asking how to improve
> the
> >>>>> security of open source projects, and one idea is to provide a SBOM
> >>>>> (Software Bill of Materials) [2] along with each release.
> >>>>>
> >>>>> I had not heard of SBOM until a couple of days ago. Is anyone on this
> >>>> list
> >>>>> familiar with SBOMs and their use? Should Calcite be providing an
> SBOM?
> >>>> Are
> >>>>> people aware of SBOM initiatives in other projects? What, in your
> >>>> opinion,
> >>>>> is the priority of this issue?
> >>>>>
> >>>>> Julian
> >>>>>
> >>>>> [1]
> >>>>>
> >>>>
> >>
> https://thehackernews.com/2021/12/second-log4j-vulnerability-cve-2021.html
> >>>>>
> >>>>> [2] https://en.wikipedia.org/wiki/Software_bill_of_materials
> >>>>>
> >>>>
> >>
> >>
>

Re: [DISCUSS] SBOM (Software Bill of Materials)

Posted by Julian Hyde <jh...@gmail.com>.
Please log a jira case for the commons-lang3 change. It looks good. One or two places I’d create a function rather than having a blob of code inline.

Your use of default locale in the CSV adapter looks wrong. Calcite is a server, so never uses default locale or time zone. In fact we use forbiddenApis to check, so we should add a few methods to its configuration. 

Julian 

> On Jan 3, 2022, at 12:30 PM, Gunnar Morling <gu...@googlemail.com.invalid> wrote:
> 
> Hi,
> 
> Thanks a lot for this, I think trimming down the dependencies of Calcite
> will be of great help for its adoption.
> 
>> So, the easiest way to reduce dependencies would be to make certain
> classes of SQL functions optional (i.e. move them out of core).
> 
> That sounds like a good idea.
> 
>> commons-lang3, commons-codec, commons-io are probably only used in one or
> two places each;
> 
> To make some progress there, I've created PR
> https://github.com/apache/calcite/pull/2672 which removes the dependency to
> commons-lang3 from the entire code base. Any feedback on that PR would
> be appreciated (I still need to log an issue, but wanted to share quickly
> what I had). I can try and take a look at the other ones, if there's
> interest in this.
> 
> Re Janino, is there any reason for not using the compiler implementation
> coming with the JDK? Alternatively, one could also consider to generate
> byte code directly using ASM, which wouldn't be beneficial dependency-wise,
> but it may improve the performance of this generation step (I still lack
> insight why this is done in the first place).
> 
> Thanks,
> 
> --Gunnar
> 
>> Am Fr., 31. Dez. 2021 um 00:56 Uhr schrieb Julian Hyde <
>> jhyde.apache@gmail.com>:
>> 
>> Regarding dependencies. Here are the runtime dependencies from
>> core/build.gradle.kts (ignoring test and annotation libraries):
>> 
>> * api("com.esri.geometry:esri-geometry-api")
>> * api("com.fasterxml.jackson.core:jackson-annotations")
>> * api("com.google.guava:guava")
>> * api("org.apache.calcite.avatica:avatica-core")
>> * api("org.slf4j:slf4j-api")
>> * implementation("com.fasterxml.jackson.core:jackson-core")
>> * implementation("com.fasterxml.jackson.core:jackson-databind")
>> *
>> implementation("com.fasterxml.jackson.dataformat:jackson-dataformat-yaml")
>> * implementation("com.google.uzaygezen:uzaygezen-core")
>> * implementation("com.jayway.jsonpath:json-path")
>> * implementation("com.yahoo.datasketches:sketches-core")
>> * implementation("commons-codec:commons-codec")
>> * implementation("net.hydromatic:aggdesigner-algorithm")
>> * implementation("org.apache.commons:commons-dbcp2")
>> * implementation("org.apache.commons:commons-lang3")
>> * implementation("commons-io:commons-io")
>> * implementation("org.codehaus.janino:commons-compiler")
>> * implementation("org.codehaus.janino:janino")
>> 
>> A few libraries are used only for a narrow range of functionality:
>> * esri-geometry and uzaygezen-core are used by geospatial functions;
>> * sketches-core is used by the HLL aggregate functions;
>> * json-path is used by some JSON functions;
>> * jackson-core, jackson-databind, jackson-dataformat-yaml are used to
>> load models, and to serialize RelNodes to and from JSON;
>> * commons-lang3, commons-codec, commons-io are probably only used in one
>> or two places each;
>> * aggdesigner-algotihm is used for recommending materialized views.
>> 
>> So, the easiest way to reduce dependencies would be to make certain
>> classes of SQL functions optional (i.e. move them out of core).
>> 
>> Julian
>> 
>> 
>> 
>>>> On Dec 29, 2021, at 1:30 PM, Jacques Nadeau <ja...@apache.org> wrote:
>>> 
>>> WRT SBOM (Julian): My general experience is that most large orgs use
>>> scanners now (either open or closed) and they will scan whether you have
>> a
>>> bill of materials or not. I wouldn't worry about adding something
>>> additional.
>>> 
>>> WRT too many dependencies (Gunnar): I completely agree with the general
>>> feeling of too many (and with Guava, jackson less so). I think the core
>>> challenge (no pun intended) is that calcite-core is really a lot of
>>> different components. For example, I have frequently wished that parser,
>>> planner and enumerable were separate modules. And if they were, I'd guess
>>> that each would have a narrower dependency range. I've also wished many
>>> times that runtime compilation was an optional addon as opposed to
>>> required/coupled in the core...
>>> 
>>> When I've thought about how to dissect in the past, I think the big
>>> challenge would be tests, where things are sometimes mixed together.
>>> Breaking change possibilities could be at least somewhat mitigated by
>>> moving classes but not packages.
>>> 
>>> On Wed, Dec 29, 2021 at 1:51 AM Gunnar Morling
>>> <gu...@googlemail.com.invalid> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> In a way, Calcite's build configuration as well as the published POM
>> could
>>>> be considered as such an SBOM? In particular when looking at the latter
>>>> through services like mvnrepository [1], you get quite a good view on
>> the
>>>> dependency versions, licenses, any potential CVEs, etc. I think this
>> should
>>>> satisfy most user needs around this? Or are you referring to the notion
>> of
>>>> Maven BOM POMs specifically [2], i.e. the notion of publishing a POM
>> with
>>>> all the Calcite component versions which people can then use with
>> Maven's
>>>> import scope (there should be something comparable for Gradle)? If so,
>> that
>>>> could be useful for users working with multiple Calcite components,
>> though
>>>> I think the usability improvement provided by such BOM POM wouldn't be
>>>> huge.
>>>> 
>>>> I wanted to bring up a related matter though. Coming to Calcite as a
>> user
>>>> just recently (loving the possibilities it provides!), I was surprised
>> by
>>>> the large number of dependencies of the project. It looks like 1.29
>>>> improves that a little bit (no more kotlin-stdlib, no more transitive
>>>> dependency to log4j 1.x), but the transitive hull of all dependencies of
>>>> calcite-core still is quite big. I lack insight about what the different
>>>> dependencies are used for; but as an application developer, Guava for
>>>> instance is a dependency which I'd prefer to not get pushed onto the
>>>> classpath transitively. Jackson is another heavy one; depending on how
>> it's
>>>> used, perhaps this could be pushed into some separate module which users
>>>> could optionally  pull in? That'd help to avoid having it around when
>> users
>>>> work with other JSON libs themselves and don't require JSON support in
>>>> Calcite.
>>>> 
>>>> From a supply chain perspective, the less transitive dependencies a
>> library
>>>> like Calcite introduces to my project, the better IMHO. Less potential
>> for
>>>> version conflicts with my own (or other transitive) dependencies, and
>> also
>>>> less potential for introducing CVEs to the dependency graph, as e.g. in
>> the
>>>> case of the Guava version currently used by Calcite; I suppose it does
>> not
>>>> impact the usage in Calcite, but these things tend to be tricky to
>> reason
>>>> about, and typical CVE reporting tooling will now create a warning for a
>>>> project using Calcite, no matter whether that specific issue actually
>> is a
>>>> problem or not.
>>>> 
>>>> Best,
>>>> 
>>>> --Gunnar
>>>> 
>>>> [1]
>>>> 
>> https://mvnrepository.com/artifact/org.apache.calcite/calcite-core/1.29.0
>>>> [2]
>>>> 
>>>> 
>> https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#bill-of-materials-bom-poms
>>>> 
>>>> 
>>>> Am Mi., 29. Dez. 2021 um 02:27 Uhr schrieb Julian Hyde <
>>>> jhyde.apache@gmail.com>:
>>>> 
>>>>> In the wake of the log4j CVEs [1], people are asking how to improve the
>>>>> security of open source projects, and one idea is to provide a SBOM
>>>>> (Software Bill of Materials) [2] along with each release.
>>>>> 
>>>>> I had not heard of SBOM until a couple of days ago. Is anyone on this
>>>> list
>>>>> familiar with SBOMs and their use? Should Calcite be providing an SBOM?
>>>> Are
>>>>> people aware of SBOM initiatives in other projects? What, in your
>>>> opinion,
>>>>> is the priority of this issue?
>>>>> 
>>>>> Julian
>>>>> 
>>>>> [1]
>>>>> 
>>>> 
>> https://thehackernews.com/2021/12/second-log4j-vulnerability-cve-2021.html
>>>>> 
>>>>> [2] https://en.wikipedia.org/wiki/Software_bill_of_materials
>>>>> 
>>>> 
>> 
>> 

Re: [DISCUSS] SBOM (Software Bill of Materials)

Posted by Gunnar Morling <gu...@googlemail.com.INVALID>.
Hi,

Thanks a lot for this, I think trimming down the dependencies of Calcite
will be of great help for its adoption.

> So, the easiest way to reduce dependencies would be to make certain
classes of SQL functions optional (i.e. move them out of core).

That sounds like a good idea.

> commons-lang3, commons-codec, commons-io are probably only used in one or
two places each;

To make some progress there, I've created PR
https://github.com/apache/calcite/pull/2672 which removes the dependency to
commons-lang3 from the entire code base. Any feedback on that PR would
be appreciated (I still need to log an issue, but wanted to share quickly
what I had). I can try and take a look at the other ones, if there's
interest in this.

Re Janino, is there any reason for not using the compiler implementation
coming with the JDK? Alternatively, one could also consider to generate
byte code directly using ASM, which wouldn't be beneficial dependency-wise,
but it may improve the performance of this generation step (I still lack
insight why this is done in the first place).

Thanks,

--Gunnar

Am Fr., 31. Dez. 2021 um 00:56 Uhr schrieb Julian Hyde <
jhyde.apache@gmail.com>:

> Regarding dependencies. Here are the runtime dependencies from
> core/build.gradle.kts (ignoring test and annotation libraries):
>
>  * api("com.esri.geometry:esri-geometry-api")
>  * api("com.fasterxml.jackson.core:jackson-annotations")
>  * api("com.google.guava:guava")
>  * api("org.apache.calcite.avatica:avatica-core")
>  * api("org.slf4j:slf4j-api")
>  * implementation("com.fasterxml.jackson.core:jackson-core")
>  * implementation("com.fasterxml.jackson.core:jackson-databind")
>  *
> implementation("com.fasterxml.jackson.dataformat:jackson-dataformat-yaml")
>  * implementation("com.google.uzaygezen:uzaygezen-core")
>  * implementation("com.jayway.jsonpath:json-path")
>  * implementation("com.yahoo.datasketches:sketches-core")
>  * implementation("commons-codec:commons-codec")
>  * implementation("net.hydromatic:aggdesigner-algorithm")
>  * implementation("org.apache.commons:commons-dbcp2")
>  * implementation("org.apache.commons:commons-lang3")
>  * implementation("commons-io:commons-io")
>  * implementation("org.codehaus.janino:commons-compiler")
>  * implementation("org.codehaus.janino:janino")
>
> A few libraries are used only for a narrow range of functionality:
>  * esri-geometry and uzaygezen-core are used by geospatial functions;
>  * sketches-core is used by the HLL aggregate functions;
>  * json-path is used by some JSON functions;
>  * jackson-core, jackson-databind, jackson-dataformat-yaml are used to
> load models, and to serialize RelNodes to and from JSON;
>  * commons-lang3, commons-codec, commons-io are probably only used in one
> or two places each;
>  * aggdesigner-algotihm is used for recommending materialized views.
>
> So, the easiest way to reduce dependencies would be to make certain
> classes of SQL functions optional (i.e. move them out of core).
>
> Julian
>
>
>
> > On Dec 29, 2021, at 1:30 PM, Jacques Nadeau <ja...@apache.org> wrote:
> >
> > WRT SBOM (Julian): My general experience is that most large orgs use
> > scanners now (either open or closed) and they will scan whether you have
> a
> > bill of materials or not. I wouldn't worry about adding something
> > additional.
> >
> > WRT too many dependencies (Gunnar): I completely agree with the general
> > feeling of too many (and with Guava, jackson less so). I think the core
> > challenge (no pun intended) is that calcite-core is really a lot of
> > different components. For example, I have frequently wished that parser,
> > planner and enumerable were separate modules. And if they were, I'd guess
> > that each would have a narrower dependency range. I've also wished many
> > times that runtime compilation was an optional addon as opposed to
> > required/coupled in the core...
> >
> > When I've thought about how to dissect in the past, I think the big
> > challenge would be tests, where things are sometimes mixed together.
> > Breaking change possibilities could be at least somewhat mitigated by
> > moving classes but not packages.
> >
> > On Wed, Dec 29, 2021 at 1:51 AM Gunnar Morling
> > <gu...@googlemail.com.invalid> wrote:
> >
> >> Hi,
> >>
> >> In a way, Calcite's build configuration as well as the published POM
> could
> >> be considered as such an SBOM? In particular when looking at the latter
> >> through services like mvnrepository [1], you get quite a good view on
> the
> >> dependency versions, licenses, any potential CVEs, etc. I think this
> should
> >> satisfy most user needs around this? Or are you referring to the notion
> of
> >> Maven BOM POMs specifically [2], i.e. the notion of publishing a POM
> with
> >> all the Calcite component versions which people can then use with
> Maven's
> >> import scope (there should be something comparable for Gradle)? If so,
> that
> >> could be useful for users working with multiple Calcite components,
> though
> >> I think the usability improvement provided by such BOM POM wouldn't be
> >> huge.
> >>
> >> I wanted to bring up a related matter though. Coming to Calcite as a
> user
> >> just recently (loving the possibilities it provides!), I was surprised
> by
> >> the large number of dependencies of the project. It looks like 1.29
> >> improves that a little bit (no more kotlin-stdlib, no more transitive
> >> dependency to log4j 1.x), but the transitive hull of all dependencies of
> >> calcite-core still is quite big. I lack insight about what the different
> >> dependencies are used for; but as an application developer, Guava for
> >> instance is a dependency which I'd prefer to not get pushed onto the
> >> classpath transitively. Jackson is another heavy one; depending on how
> it's
> >> used, perhaps this could be pushed into some separate module which users
> >> could optionally  pull in? That'd help to avoid having it around when
> users
> >> work with other JSON libs themselves and don't require JSON support in
> >> Calcite.
> >>
> >> From a supply chain perspective, the less transitive dependencies a
> library
> >> like Calcite introduces to my project, the better IMHO. Less potential
> for
> >> version conflicts with my own (or other transitive) dependencies, and
> also
> >> less potential for introducing CVEs to the dependency graph, as e.g. in
> the
> >> case of the Guava version currently used by Calcite; I suppose it does
> not
> >> impact the usage in Calcite, but these things tend to be tricky to
> reason
> >> about, and typical CVE reporting tooling will now create a warning for a
> >> project using Calcite, no matter whether that specific issue actually
> is a
> >> problem or not.
> >>
> >> Best,
> >>
> >> --Gunnar
> >>
> >> [1]
> >>
> https://mvnrepository.com/artifact/org.apache.calcite/calcite-core/1.29.0
> >> [2]
> >>
> >>
> https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#bill-of-materials-bom-poms
> >>
> >>
> >> Am Mi., 29. Dez. 2021 um 02:27 Uhr schrieb Julian Hyde <
> >> jhyde.apache@gmail.com>:
> >>
> >>> In the wake of the log4j CVEs [1], people are asking how to improve the
> >>> security of open source projects, and one idea is to provide a SBOM
> >>> (Software Bill of Materials) [2] along with each release.
> >>>
> >>> I had not heard of SBOM until a couple of days ago. Is anyone on this
> >> list
> >>> familiar with SBOMs and their use? Should Calcite be providing an SBOM?
> >> Are
> >>> people aware of SBOM initiatives in other projects? What, in your
> >> opinion,
> >>> is the priority of this issue?
> >>>
> >>> Julian
> >>>
> >>> [1]
> >>>
> >>
> https://thehackernews.com/2021/12/second-log4j-vulnerability-cve-2021.html
> >>>
> >>> [2] https://en.wikipedia.org/wiki/Software_bill_of_materials
> >>>
> >>
>
>

Re: [DISCUSS] SBOM (Software Bill of Materials)

Posted by Julian Hyde <jh...@gmail.com>.
Regarding dependencies. Here are the runtime dependencies from core/build.gradle.kts (ignoring test and annotation libraries):

 * api("com.esri.geometry:esri-geometry-api")
 * api("com.fasterxml.jackson.core:jackson-annotations")
 * api("com.google.guava:guava")
 * api("org.apache.calcite.avatica:avatica-core")
 * api("org.slf4j:slf4j-api")
 * implementation("com.fasterxml.jackson.core:jackson-core")
 * implementation("com.fasterxml.jackson.core:jackson-databind")
 * implementation("com.fasterxml.jackson.dataformat:jackson-dataformat-yaml")
 * implementation("com.google.uzaygezen:uzaygezen-core")
 * implementation("com.jayway.jsonpath:json-path")
 * implementation("com.yahoo.datasketches:sketches-core")
 * implementation("commons-codec:commons-codec")
 * implementation("net.hydromatic:aggdesigner-algorithm")
 * implementation("org.apache.commons:commons-dbcp2")
 * implementation("org.apache.commons:commons-lang3")
 * implementation("commons-io:commons-io")
 * implementation("org.codehaus.janino:commons-compiler")
 * implementation("org.codehaus.janino:janino")

A few libraries are used only for a narrow range of functionality:
 * esri-geometry and uzaygezen-core are used by geospatial functions;
 * sketches-core is used by the HLL aggregate functions;
 * json-path is used by some JSON functions;
 * jackson-core, jackson-databind, jackson-dataformat-yaml are used to load models, and to serialize RelNodes to and from JSON;
 * commons-lang3, commons-codec, commons-io are probably only used in one or two places each;
 * aggdesigner-algotihm is used for recommending materialized views.

So, the easiest way to reduce dependencies would be to make certain classes of SQL functions optional (i.e. move them out of core).

Julian



> On Dec 29, 2021, at 1:30 PM, Jacques Nadeau <ja...@apache.org> wrote:
> 
> WRT SBOM (Julian): My general experience is that most large orgs use
> scanners now (either open or closed) and they will scan whether you have a
> bill of materials or not. I wouldn't worry about adding something
> additional.
> 
> WRT too many dependencies (Gunnar): I completely agree with the general
> feeling of too many (and with Guava, jackson less so). I think the core
> challenge (no pun intended) is that calcite-core is really a lot of
> different components. For example, I have frequently wished that parser,
> planner and enumerable were separate modules. And if they were, I'd guess
> that each would have a narrower dependency range. I've also wished many
> times that runtime compilation was an optional addon as opposed to
> required/coupled in the core...
> 
> When I've thought about how to dissect in the past, I think the big
> challenge would be tests, where things are sometimes mixed together.
> Breaking change possibilities could be at least somewhat mitigated by
> moving classes but not packages.
> 
> On Wed, Dec 29, 2021 at 1:51 AM Gunnar Morling
> <gu...@googlemail.com.invalid> wrote:
> 
>> Hi,
>> 
>> In a way, Calcite's build configuration as well as the published POM could
>> be considered as such an SBOM? In particular when looking at the latter
>> through services like mvnrepository [1], you get quite a good view on the
>> dependency versions, licenses, any potential CVEs, etc. I think this should
>> satisfy most user needs around this? Or are you referring to the notion of
>> Maven BOM POMs specifically [2], i.e. the notion of publishing a POM with
>> all the Calcite component versions which people can then use with Maven's
>> import scope (there should be something comparable for Gradle)? If so, that
>> could be useful for users working with multiple Calcite components, though
>> I think the usability improvement provided by such BOM POM wouldn't be
>> huge.
>> 
>> I wanted to bring up a related matter though. Coming to Calcite as a user
>> just recently (loving the possibilities it provides!), I was surprised by
>> the large number of dependencies of the project. It looks like 1.29
>> improves that a little bit (no more kotlin-stdlib, no more transitive
>> dependency to log4j 1.x), but the transitive hull of all dependencies of
>> calcite-core still is quite big. I lack insight about what the different
>> dependencies are used for; but as an application developer, Guava for
>> instance is a dependency which I'd prefer to not get pushed onto the
>> classpath transitively. Jackson is another heavy one; depending on how it's
>> used, perhaps this could be pushed into some separate module which users
>> could optionally  pull in? That'd help to avoid having it around when users
>> work with other JSON libs themselves and don't require JSON support in
>> Calcite.
>> 
>> From a supply chain perspective, the less transitive dependencies a library
>> like Calcite introduces to my project, the better IMHO. Less potential for
>> version conflicts with my own (or other transitive) dependencies, and also
>> less potential for introducing CVEs to the dependency graph, as e.g. in the
>> case of the Guava version currently used by Calcite; I suppose it does not
>> impact the usage in Calcite, but these things tend to be tricky to reason
>> about, and typical CVE reporting tooling will now create a warning for a
>> project using Calcite, no matter whether that specific issue actually is a
>> problem or not.
>> 
>> Best,
>> 
>> --Gunnar
>> 
>> [1]
>> https://mvnrepository.com/artifact/org.apache.calcite/calcite-core/1.29.0
>> [2]
>> 
>> https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#bill-of-materials-bom-poms
>> 
>> 
>> Am Mi., 29. Dez. 2021 um 02:27 Uhr schrieb Julian Hyde <
>> jhyde.apache@gmail.com>:
>> 
>>> In the wake of the log4j CVEs [1], people are asking how to improve the
>>> security of open source projects, and one idea is to provide a SBOM
>>> (Software Bill of Materials) [2] along with each release.
>>> 
>>> I had not heard of SBOM until a couple of days ago. Is anyone on this
>> list
>>> familiar with SBOMs and their use? Should Calcite be providing an SBOM?
>> Are
>>> people aware of SBOM initiatives in other projects? What, in your
>> opinion,
>>> is the priority of this issue?
>>> 
>>> Julian
>>> 
>>> [1]
>>> 
>> https://thehackernews.com/2021/12/second-log4j-vulnerability-cve-2021.html
>>> 
>>> [2] https://en.wikipedia.org/wiki/Software_bill_of_materials
>>> 
>> 


Re: [DISCUSS] SBOM (Software Bill of Materials)

Posted by Jacques Nadeau <ja...@apache.org>.
WRT SBOM (Julian): My general experience is that most large orgs use
scanners now (either open or closed) and they will scan whether you have a
bill of materials or not. I wouldn't worry about adding something
additional.

WRT too many dependencies (Gunnar): I completely agree with the general
feeling of too many (and with Guava, jackson less so). I think the core
challenge (no pun intended) is that calcite-core is really a lot of
different components. For example, I have frequently wished that parser,
planner and enumerable were separate modules. And if they were, I'd guess
that each would have a narrower dependency range. I've also wished many
times that runtime compilation was an optional addon as opposed to
required/coupled in the core...

When I've thought about how to dissect in the past, I think the big
challenge would be tests, where things are sometimes mixed together.
Breaking change possibilities could be at least somewhat mitigated by
moving classes but not packages.

On Wed, Dec 29, 2021 at 1:51 AM Gunnar Morling
<gu...@googlemail.com.invalid> wrote:

> Hi,
>
> In a way, Calcite's build configuration as well as the published POM could
> be considered as such an SBOM? In particular when looking at the latter
> through services like mvnrepository [1], you get quite a good view on the
> dependency versions, licenses, any potential CVEs, etc. I think this should
> satisfy most user needs around this? Or are you referring to the notion of
> Maven BOM POMs specifically [2], i.e. the notion of publishing a POM with
> all the Calcite component versions which people can then use with Maven's
> import scope (there should be something comparable for Gradle)? If so, that
> could be useful for users working with multiple Calcite components, though
> I think the usability improvement provided by such BOM POM wouldn't be
> huge.
>
> I wanted to bring up a related matter though. Coming to Calcite as a user
> just recently (loving the possibilities it provides!), I was surprised by
> the large number of dependencies of the project. It looks like 1.29
> improves that a little bit (no more kotlin-stdlib, no more transitive
> dependency to log4j 1.x), but the transitive hull of all dependencies of
> calcite-core still is quite big. I lack insight about what the different
> dependencies are used for; but as an application developer, Guava for
> instance is a dependency which I'd prefer to not get pushed onto the
> classpath transitively. Jackson is another heavy one; depending on how it's
> used, perhaps this could be pushed into some separate module which users
> could optionally  pull in? That'd help to avoid having it around when users
> work with other JSON libs themselves and don't require JSON support in
> Calcite.
>
> From a supply chain perspective, the less transitive dependencies a library
> like Calcite introduces to my project, the better IMHO. Less potential for
> version conflicts with my own (or other transitive) dependencies, and also
> less potential for introducing CVEs to the dependency graph, as e.g. in the
> case of the Guava version currently used by Calcite; I suppose it does not
> impact the usage in Calcite, but these things tend to be tricky to reason
> about, and typical CVE reporting tooling will now create a warning for a
> project using Calcite, no matter whether that specific issue actually is a
> problem or not.
>
> Best,
>
> --Gunnar
>
> [1]
> https://mvnrepository.com/artifact/org.apache.calcite/calcite-core/1.29.0
> [2]
>
> https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#bill-of-materials-bom-poms
>
>
> Am Mi., 29. Dez. 2021 um 02:27 Uhr schrieb Julian Hyde <
> jhyde.apache@gmail.com>:
>
> > In the wake of the log4j CVEs [1], people are asking how to improve the
> > security of open source projects, and one idea is to provide a SBOM
> > (Software Bill of Materials) [2] along with each release.
> >
> > I had not heard of SBOM until a couple of days ago. Is anyone on this
> list
> > familiar with SBOMs and their use? Should Calcite be providing an SBOM?
> Are
> > people aware of SBOM initiatives in other projects? What, in your
> opinion,
> > is the priority of this issue?
> >
> > Julian
> >
> > [1]
> >
> https://thehackernews.com/2021/12/second-log4j-vulnerability-cve-2021.html
> >
> > [2] https://en.wikipedia.org/wiki/Software_bill_of_materials
> >
>

Re: [DISCUSS] SBOM (Software Bill of Materials)

Posted by Gunnar Morling <gu...@googlemail.com.INVALID>.
Hi,

In a way, Calcite's build configuration as well as the published POM could
be considered as such an SBOM? In particular when looking at the latter
through services like mvnrepository [1], you get quite a good view on the
dependency versions, licenses, any potential CVEs, etc. I think this should
satisfy most user needs around this? Or are you referring to the notion of
Maven BOM POMs specifically [2], i.e. the notion of publishing a POM with
all the Calcite component versions which people can then use with Maven's
import scope (there should be something comparable for Gradle)? If so, that
could be useful for users working with multiple Calcite components, though
I think the usability improvement provided by such BOM POM wouldn't be huge.

I wanted to bring up a related matter though. Coming to Calcite as a user
just recently (loving the possibilities it provides!), I was surprised by
the large number of dependencies of the project. It looks like 1.29
improves that a little bit (no more kotlin-stdlib, no more transitive
dependency to log4j 1.x), but the transitive hull of all dependencies of
calcite-core still is quite big. I lack insight about what the different
dependencies are used for; but as an application developer, Guava for
instance is a dependency which I'd prefer to not get pushed onto the
classpath transitively. Jackson is another heavy one; depending on how it's
used, perhaps this could be pushed into some separate module which users
could optionally  pull in? That'd help to avoid having it around when users
work with other JSON libs themselves and don't require JSON support in
Calcite.

From a supply chain perspective, the less transitive dependencies a library
like Calcite introduces to my project, the better IMHO. Less potential for
version conflicts with my own (or other transitive) dependencies, and also
less potential for introducing CVEs to the dependency graph, as e.g. in the
case of the Guava version currently used by Calcite; I suppose it does not
impact the usage in Calcite, but these things tend to be tricky to reason
about, and typical CVE reporting tooling will now create a warning for a
project using Calcite, no matter whether that specific issue actually is a
problem or not.

Best,

--Gunnar

[1]
https://mvnrepository.com/artifact/org.apache.calcite/calcite-core/1.29.0
[2]
https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#bill-of-materials-bom-poms


Am Mi., 29. Dez. 2021 um 02:27 Uhr schrieb Julian Hyde <
jhyde.apache@gmail.com>:

> In the wake of the log4j CVEs [1], people are asking how to improve the
> security of open source projects, and one idea is to provide a SBOM
> (Software Bill of Materials) [2] along with each release.
>
> I had not heard of SBOM until a couple of days ago. Is anyone on this list
> familiar with SBOMs and their use? Should Calcite be providing an SBOM? Are
> people aware of SBOM initiatives in other projects? What, in your opinion,
> is the priority of this issue?
>
> Julian
>
> [1]
> https://thehackernews.com/2021/12/second-log4j-vulnerability-cve-2021.html
>
> [2] https://en.wikipedia.org/wiki/Software_bill_of_materials
>