You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Amogh Desai <am...@gmail.com> on 2024/04/01 04:32:46 UTC

Re: [DISCUSS] Proposal for adding Telemetry via Scarf

+1 looks like a good tool which could be super helpful.

* We should have some transparency into the data that is collected or sent
* We should have an option to optionally opt-out

Thanks & Regards,
Amogh Desai


On Sun, Mar 31, 2024 at 7:53 AM Wei Lee <we...@gmail.com> wrote:

> +1 to this. It would be really useful. As long as we can opt out, I think
> we’re good.
>
> Best,
> Wei
>
> > On Mar 31, 2024, at 12:47 AM, Kaxil Naik <ka...@gmail.com> wrote:
> >
> > Grammar Correction:
> >
> > We should assume that those who deploy and upgrade Airflow - actually
> read
> >> and take into account what is written in the release notes - especially
> if
> >> they have security guys breathing their necks, similarly as we have to
> >> assume they follow CVE announcements about security issues fixed. If we
> >> are very straightforward and out-going about the change, inform very
> >> clearly how to opt-out, I don't see a big problem with opt-out.
> >
> >
> > I couldn't agree more; even though we shouldn't collect any data that
> > hamper security (and we should aim to do the same), most security
> concerned
> > folks don't just upgrade, and we can rely on them regarding release notes
> > or announcements and we can make it very clear in our announcements too;
> > and in our installation guides.
> >
> > On Sat, 30 Mar 2024 at 16:47, Kaxil Naik <ka...@gmail.com> wrote:
> >
> >> Grammar crrection:
> >>
> >>
> >> On Sat, 30 Mar 2024 at 16:43, Kaxil Naik <ka...@gmail.com> wrote:
> >>
> >>> Have this at the end of the email too: but if folks don't read until
> the
> >>> end and quoting Maxime from the use-case blog[1]:
> >>>
> >>> "I think people often ask ‘how do I contribute to open source?’, ‘I've
> >>> got to get into the code’, or ‘ I’ve got to be an engineer.’ Actually,
> the
> >>> very simplest thing that you can do is just say, ‘my organization gets
> real
> >>> value from this piece of software.’ There are a bunch of ways to let
> the
> >>> people know about it – and now Scarf is there. If your organization is
> >>> getting a lot of value from a piece of open source software, make sure
> the
> >>> devs know about it."
> >>>
> >>> What kind of edge cases are you thinking about? I don't think it makes
> >>> sense to have "opt-in" at all. As the goal is to collect data for most
> >>> Airflow installations except for those that don't want to give data,
> then
> >>> "opt-out" is the only way to maximize it. As long as we don't collect
> any
> >>> PII data, this is in-compliance as well.
> >>>
> >>> Imagine someone learning Airflow, if they have to opt-in via a config,
> >>> they wouldn't even know or care about it, hence us losing most of the
> data.
> >>> I understand why some orgs & individuals may want to opt-out.
> >>>
> >>> Scarf Provides tracking pixels (essentially an HTML image tag) that you
> >>> can place in your website or product to track visitors to that URL. If
> >>> there were any concerns about Privacy, ASF wouldn't have approved it
> at all.
> >>>
> >>> A few key details to note about the pixel:
> >>>
> >>>
> >>>   - No PII is tracked… Scarf does not capture/retain IP information…
> >>>   this information is discarded by the platform upon
> processing/aggregating
> >>>   - Scarf pixels respect the Do Not Track (DNT) settings of browsers -
> >>>   these users will not be tracked whatsoever.
> >>>
> >>>
> >>> All the ASF projects I had listed (whether they use Scarf gateway or
> >>> Scarf pixel in product) are using opt-out.
> >>>
> >>> 1. Short opt-in period before opt-out. Test this feature with users who
> >>>> trust and if it works great - make it public. I think it's wise to
> handle
> >>>> edge cases and configure collected data more accurately.
> >>>
> >>>
> >>>
> >>> It would be a pixel in the webserver, should affect nothing at all even
> >>> in an air-gapped environment.
> >>>
> >>>> 2. It should not affect anything if access to the internet is
> restricted
> >>>> which is default for many companies.
> >>>
> >>>
> >>>
> >>> 100% agreed on the below:
> >>>
> >>>> I think we have a very good blueprint to follow including at least 5
> >>>> other
> >>>> ASF projects that also passed the review of the privacy@asf. And
> while I
> >>>> understand (and concur) the urge for opt-in by default coming from
> >>>> consumer
> >>>> market (where it makes perfect sense) Airflow is not a consumer
> >>>> software and is used in "corporate environment" which has a little
> >>>> different expectations and broad assumption that the company can make
> >>>> decisions on such telemetry on behalf of the employees using it.
> >>>
> >>>
> >>> Couldn't agree more; even though there shouldn't we collect hamper
> >>> security (and we should aim to do the same), most security concerned
> folks
> >>> don't just
> >>> upgrade, and we can rely on them regarding release notes or
> announcements
> >>> and we can make it very clear in our announcements too; and in our
> >>> installation guides.
> >>>
> >>> We should assume that those who deploy and upgrade Airflow - actually
> read
> >>>> and take into account what is written in the release notes -
> especially
> >>>> if
> >>>> they have security guys breathing their necks, similarly as we have to
> >>>> assume they follow CVE announcements about security issues fixed. If
> we
> >>>> are very straightforward and out-going about the change, inform very
> >>>> clearly how to opt-out, I don't see a big problem with opt-out.
> >>>
> >>>
> >>>
> >>> To be clear, the collection of data, or at least the data we should
> >>> gather here should help all the consumers without violating anything
> >>> regulations. I will quote Maxime's quote in the use-case doc [1]
> >>>
> >>> "*Another Form of Contributing*
> >>> “I think people often ask ‘how do I contribute to open source?’, ‘I've
> >>> got to get into the code’, or ‘ I’ve got to be an engineer.’ Actually,
> the
> >>> very simplest thing that you can do is just say, ‘my organization gets
> real
> >>> value from this piece of software.’ There are a bunch of ways to let
> the
> >>> people know about it – and now Scarf is there. If your organization is
> >>> getting a lot of value from a piece of open source software, make sure
> the
> >>> devs know about it.”"
> >>>
> >>>
> >>> [1] https://about.scarf.sh/post/scarf-case-study-apache-superset
> >>>
> >>> On Sat, 30 Mar 2024 at 14:02, Alexander Shorin <kx...@apache.org>
> wrote:
> >>>
> >>>> Hi Jarek!
> >>>>
> >>>> I understand the reasons for opt-out from a project view. I just
> suddenly
> >>>> imagined the situation when an upgrade happens and here comes the
> data to
> >>>> some third party service - that's a view from a user side of some big
> >>>> company.
> >>>>
> >>>> There could be good alternatives to handle this:
> >>>> 1. Short opt-in period before opt-out. Test this feature with users
> who
> >>>> trust and if it works great - make it public. I think it's wise to
> handle
> >>>> edge cases and configure collected data more accurately.
> >>>> 2. Explicitly somehow warn about this feature to make this feature not
> >>>> get
> >>>> unnoticed. Just to reduce possible frustration.
> >>>>
> >>>> Just a personal thoughts for discussion (:
> >>>>
> >>>> --
> >>>> ,,,^..^,,,
> >>>>
> >>>> On Sat, Mar 30, 2024 at 4:36 PM Jarek Potiuk <ja...@potiuk.com>
> wrote:
> >>>>
> >>>>> Hello everyone,
> >>>>>
> >>>>> it has to be:
> >>>>>
> >>>>> 1. Opt-in by default to not trigger security guys about new unplanned
> >>>>>> activity after regular upgrade.
> >>>>>>
> >>>>>
> >>>>> That's a very good point about security triggering Alexander, but I
> am
> >>>> not
> >>>>> so sure it means that we "have to" do opt-in. There are other ways of
> >>>>> communicating with the "deployment managers" who install and upgrade
> >>>>> airflow - i.e. release notes. blogs, social media of ours, slack
> >>>>> announcements etc. We have plenty of channels we can use to
> >>>> communicate the
> >>>>> change.
> >>>>>
> >>>>> I think we have a very good blueprint to follow including at least 5
> >>>> other
> >>>>> ASF projects that also passed the review of the privacy@asf. And
> >>>> while I
> >>>>> understand (and concur) the urge for opt-in by default coming from
> >>>> consumer
> >>>>> market (where it makes perfect sense) Airflow is not a consumer
> >>>>> software and is used in "corporate environment" which has a little
> >>>>> different expectations and broad assumption that the company can make
> >>>>> decisions on such telemetry on behalf of the employees using it.
> >>>>>
> >>>>> We should assume that those who deploy and upgrade Airflow - actually
> >>>> read
> >>>>> and take into account what is written in the release notes -
> >>>> especially if
> >>>>> they have security guys breathing their necks, similarly as we have
> to
> >>>>> assume they follow CVE announcements about security issues fixed. If
> we
> >>>>> are very straightforward and out-going about the change, inform very
> >>>>> clearly how to opt-out, I don't see a big problem with opt-out.
> >>>>>
> >>>>> We should of course check with privacy@a.o (but I'v spend a good
> deal
> >>>> of
> >>>>> time reading the Superset  and other use case and explanation in
> >>>> detail to
> >>>>> make a better informed decision) - and it looks like they also went
> >>>> opt-out
> >>>>> way and got cleared by privacy@a.o.  And if we cannot reach
> >>>> consensus, we
> >>>>> should - as usual - make a voting decision on it (because yes, it is
> an
> >>>>> important decision), but - after reading and understanding why others
> >>>> also
> >>>>> did it - for me personally, opt-out is a good path.
> >>>>>
> >>>>> Also because it will rather increase the amount of data to gather,
> and
> >>>> in
> >>>>> our case - counter intuitively - it will be even better for privacy
> and
> >>>>> corporate anonymity, because the more data we get, the more difficult
> >>>> it
> >>>>> will be to get any non-statistical/non-aggregated insight from it.
> >>>> Imagine
> >>>>> if only a few corporate users will enable it consciously - then we
> >>>> will be
> >>>>> able to draw much more conclusions if we find out who they are, than
> if
> >>>>> everyone has it enabled by default.
> >>>>>
> >>>>> That's my take on it - but again, it's up to us to vote, for me
> opt-in
> >>>> is
> >>>>> not "has to", and I am rather for opt-out.
> >>>>>
> >>>>> J.
> >>>>>
> >>>>>> Hi all,
> >>>>>>
> >>>>>>
> >>>>>>> I want to propose gathering telemetry for Airflow installations.
> >>>> As the
> >>>>>>> Airflow community, we have been relying heavily on the yearly
> >>>> Airflow
> >>>>>>> Survey and anecdotes to answer a few key questions about Airflow
> >>>> usage.
> >>>>>>> Questions like the following:
> >>>>>>>
> >>>>>>>
> >>>>>>>   - Which versions of Airflow are people installing/using now
> >>>> (i.e.
> >>>>>>>   whether people have primarily made the jump from version X to
> >>>>> version
> >>>>>> Y)
> >>>>>>>   - Which DB is used as the Metadata DB and which version e.g Pg
> >>>> 14?
> >>>>>>>   - What Python version is being used?
> >>>>>>>   - Which Executor is being used?
> >>>>>>>   - Approximately how many people out there in the world are
> >>>>> installing
> >>>>>>>   Airflow
> >>>>>>>
> >>>>>>>
> >>>>>>> There is a solution that should help answer these questions: Scarf
> >>>> [1].
> >>>>>> The
> >>>>>>> ASF already approves Scarf [2][3] and is already used by other ASF
> >>>>>>> projects: Superset [4], Dolphin Scheduler [5], Dubbo Kubernetes,
> >>>>> DevLake,
> >>>>>>> Skywalking as it follows GDPR and other regulations.
> >>>>>>>
> >>>>>>> Similar to Superset, we probably can use it as follows:
> >>>>>>>
> >>>>>>>
> >>>>>>>   1. Install the `scarf js` npm package and bundle it in the
> >>>>> Webserver.
> >>>>>>>   When the package is downloaded & Airflow webserver is opened,
> >>>>> metadata
> >>>>>>> is
> >>>>>>>   recorded to the Scarf dashboard.
> >>>>>>>   2. Utilize the Scarf Gateway [6], which we can use in front of
> >>>>> docker
> >>>>>>>   containers. While it’s possible people go around this gateway,
> >>>> we
> >>>>> can
> >>>>>>>   probably configure and encourage most traffic to go through
> >>>> these
> >>>>>>> gateways.
> >>>>>>>
> >>>>>>> While Scarf does not store any personally identifying information
> >>>> from
> >>>>>> SDK
> >>>>>>> telemetry data, it does send various bits of IP-derived
> >>>> information as
> >>>>>>> outlined here [7]. This data should be made as transparent as
> >>>> possible
> >>>>> by
> >>>>>>> granting dashboard access to the Airflow PMC and any other relevant
> >>>>> means
> >>>>>>> of sharing/surfacing it that we encounter (Town Hall, Slack,
> >>>> Newsletter
> >>>>>>> etc).
> >>>>>>>
> >>>>>>> The following case studies are worth reading:
> >>>>>>>
> >>>>>>>   1. https://about.scarf.sh/post/scarf-case-study-apache-superset
> >>>>> (From
> >>>>>>>   Maxime)
> >>>>>>>   2.
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> https://about.scarf.sh/post/haskell-org-bridging-the-gap-between-language-innovation-and-community-understanding
> >>>>>>>
> >>>>>>> Similar to them, this could help in various ways that come with
> >>>> using
> >>>>>> data
> >>>>>>> for decision-making. With clear guidelines on "how to opt-out"
> >>>>>> [8][9][10] &
> >>>>>>> "what data is being collected" on the Airflow website, this can be
> >>>>>>> beneficial to the entire community as we would be making more
> >>>> informed
> >>>>>>> decisions.
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> Kaxil
> >>>>>>>
> >>>>>>>
> >>>>>>> [1] https://about.scarf.sh/
> >>>>>>> [2] https://privacy.apache.org/policies/privacy-policy-public.html
> >>>>>>> [3] https://privacy.apache.org/faq/committers.html
> >>>>>>> [4] https://github.com/apache/superset/issues/25639
> >>>>>>> [5]
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> https://github.com/search?q=repo%3Aapache%2Fdolphinscheduler%20scarf.sh&type=code
> >>>>>>> [6] https://about.scarf.sh/scarf-gateway
> >>>>>>> [7] https://about.scarf.sh/privacy-policy
> >>>>>>> [8]
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> https://superset.apache.org/docs/frequently-asked-questions/#does-superset-collect-any-telemetry-data
> >>>>>>> [9]
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> https://superset.apache.org/docs/installation/installing-superset-using-docker-compose
> >>>>>>> [10]
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@airflow.apache.org
> For additional commands, e-mail: dev-help@airflow.apache.org
>
>

Re: [DISCUSS] Proposal for adding Telemetry via Scarf

Posted by Jed Cunningham <je...@apache.org>.
+1, looking forward to having better data!

Re: [DISCUSS] Proposal for adding Telemetry via Scarf

Posted by Pankaj Koti <pa...@astronomer.io.INVALID>.
Once we decide to go ahead with this, I think it might be worth for us to
also check if we can export metrics regarding which all providers are
getting used for the user deployments. That I believe would help us
understand the adoption of the provider and would also help in our decision
making when we discuss that one is due for suspension.

On Tue, 9 Apr 2024, 20:51 Kaxil Naik, <ka...@gmail.com> wrote:

> The webserver is packaged after compiling, so that won't be possible
> Michal.
>
> On Tue, 9 Apr 2024 at 11:02, Michał Modras <mi...@google.com>
> wrote:
>
> > If it is packaged and installed by default, we add the dependency (and
> its
> > dependencies) to Airflow's already-not-small dependency tree. If we make
> it
> > installed and enabled by default, would there be an easy way to not just
> > switch it off (e.g. through the env variable), but also not package it at
> > all? That's why I was suggesting a provider, but actually any other
> > pluggable (and unpluggable) mechanism would work.
> >
> > On Tue, Apr 9, 2024 at 2:41 AM Hussein Awala <hu...@awala.fr> wrote:
> >
> >> > Other than that I don't mind it being e.g. optional provider.
> >>
> >> I don't think it is possible to implement it in a provider because it
> is a
> >> js package installed on the webserver; we could implement it as a plugin
> >> (Blueprint), but in this case, the user must make an effort to register
> >> it.
> >>
> >> It would be better to always install it, and activate it by default,
> with
> >> the possibility of deactivating it via the environment variable
> >> `SCARF_ANALYTICS=false` (according to the documentation), where if it is
> >> deactivated by default, many users will not activate it even if they
> don't
> >> mind to report the metrics, but if we enable it by default, only users
> who
> >> don't want to send metrics will disable it.
> >>
> >>
> >> On Fri, Apr 5, 2024 at 6:19 PM Michał Modras
> >> <mi...@google.com.invalid> wrote:
> >>
> >> > My 2 cents: it must be possible to opt-out, preferably it should be
> >> > possible to deploy Airflow instances without bundling the telemetry
> >> library
> >> > dependencies. Other than that I don't mind it being e.g. optional
> >> provider.
> >> >
> >> > śr., 3 kwi 2024, 22:42 użytkownik Hussein Awala <hu...@awala.fr>
> >> > napisał:
> >> >
> >> > > > I'd like to propose, that we start with collecting simple data
> with
> >> > > limited access: to all the PMC members. We can always expand it to
> >> > > Committers and then expand further to make it invite-only or setup
> >> > > exporting it to a DB like Postgres
> >> > > <https://github.com/scarf-sh/scarf-postgres-exporter> and have a
> >> > publicly
> >> > > viewable dashboard.
> >> > >
> >> > > Looks like a good plan; we can discuss the export format when we
> >> decide
> >> > to
> >> > > do it.
> >> > >
> >> > > On Wed, Apr 3, 2024 at 7:59 PM Kaxil Naik <ka...@gmail.com>
> >> wrote:
> >> > >
> >> > > > Yup, exactly.
> >> > > >
> >> > > > I believe this would definitely help us take early and informed
> >> > > decisions.
> >> > > >> E.g. Had we had this earlier, I believe it would have definitely
> >> > helped
> >> > > us
> >> > > >> more for our past discussions like whether we should continue
> >> > supporting
> >> > > >> MsSQL(
> >> > https://lists.apache.org/thread/r06j306hldg03g2my1pd4nyjxg78b3h4
> >> > > ),
> >> > > >> similarly about the DaskExecutor (
> >> > > >> https://lists.apache.org/thread/ptwjf5g87lyl5476krt91bzfrm96pnb1
> ),
> >> > etc.
> >> > > >>
> >> > > >
> >> > > >
> >> > > > Btw clarifying my own stance on the below; and let me know what
> you
> >> > > think @Hussein
> >> > > > Awala <hu...@awala.fr> : I'd like to propose, that we start
> with
> >> > > > collecting simple data with limited access: to all the PMC
> members.
> >> We
> >> > > can
> >> > > > always expand it to Committers and then expand further to make it
> >> > > > invite-only or setup exporting it to a DB like Postgres
> >> > > > <https://github.com/scarf-sh/scarf-postgres-exporter> and have a
> >> > > publicly
> >> > > > viewable dashboard. It would be similar to an iterative software
> >> > > > development approach, since this will be the first time for us, as
> >> > > Airflow
> >> > > > PMC, to add such telemetry. This is of course just my opinion
> >> though :)
> >> > > >
> >> > > > Regarding the data, like I had mentioned in the email and I am
> glad
> >> > > others
> >> > > >> including you are on the same page that the data will be shared
> >> with
> >> > all
> >> > > >> PMC members. The point about sharing it via website and
> newsletter
> >> was
> >> > > for
> >> > > >> the community — Airflow users. I don’t think anyone in the
> >> community
> >> > > (apart
> >> > > >> from the PMC members) would need raw data. And even if they need
> >> it,
> >> > I’d
> >> > > >> say they should put effort and contribute to the Airflow project
> >> and
> >> > > become
> >> > > >> PMC members.
> >> > > >> To be clear: this telemetry data should help us, as Airflow PMC,
> to
> >> > > steer
> >> > > >> some of the decision making based on this data similar to how
> only
> >> PMC
> >> > > has
> >> > > >> a binding vote on the releases. [1] and this is similar to how
> >> Apache
> >> > > >> Superset does it too.
> >> > > >> [1]
> >> > > >> https://www.apache.org/dev/pmc.html#what-is-a-pmc
> >> > > >
> >> > > >
> >> > > > On Wed, 3 Apr 2024 at 12:03, Pankaj Koti <
> pankaj.koti@astronomer.io
> >> > > .invalid>
> >> > > > wrote:
> >> > > >
> >> > > >> +1 to introduce this.
> >> > > >>
> >> > > >> I believe this would definitely help us take early and informed
> >> > > decisions.
> >> > > >> E.g. Had we had this earlier, I believe it would have definitely
> >> > helped
> >> > > us
> >> > > >> more for our past discussions like whether we should continue
> >> > supporting
> >> > > >> MsSQL(
> >> > https://lists.apache.org/thread/r06j306hldg03g2my1pd4nyjxg78b3h4
> >> > > ),
> >> > > >> similarly about the DaskExecutor (
> >> > > >> https://lists.apache.org/thread/ptwjf5g87lyl5476krt91bzfrm96pnb1
> ),
> >> > etc.
> >> > > >>
> >> > > >>
> >> > > >> Best regards,
> >> > > >>
> >> > > >> *Pankaj Koti*
> >> > > >> Senior Software Engineer (Airflow OSS Engineering team)
> >> > > >> Location: Pune, Maharashtra, India
> >> > > >> Timezone: Indian Standard Time (IST)
> >> > > >> Phone: +91 9730079985 <+91%2097300%2079985>
> >> > > >>
> >> > > >>
> >> > > >> On Wed, Apr 3, 2024 at 2:44 PM Kaxil Naik <ka...@gmail.com>
> >> > wrote:
> >> > > >>
> >> > > >> > Yup, I had added a link to scarf docs in the original email
> that
> >> > > >> referenced
> >> > > >> > opting out and we should even add an Airflow config that puts
> all
> >> > > >> config in
> >> > > >> > a single place. Without it we can’t be compliant to all the
> >> policies
> >> > > >> even
> >> > > >> > if we collectively ignore or are unaware of the importance of
> it.
> >> > > >> >
> >> > > >> > Regarding the data, like I had mentioned in the email and I am
> >> glad
> >> > > >> others
> >> > > >> > including you are on the same page that the data will be shared
> >> with
> >> > > all
> >> > > >> > PMC members. The point about sharing it via website and
> >> newsletter
> >> > was
> >> > > >> for
> >> > > >> > the community — Airflow users. I don’t think anyone in the
> >> community
> >> > > >> (apart
> >> > > >> > from the PMC members) would need raw data. And even if they
> need
> >> it,
> >> > > I’d
> >> > > >> > say they should put effort and contribute to the Airflow
> project
> >> and
> >> > > >> become
> >> > > >> > PMC members.
> >> > > >> >
> >> > > >> > To be clear: this telemetry data should help us, as Airflow
> PMC,
> >> to
> >> > > >> steer
> >> > > >> > some of the decision making based on this data similar to how
> >> only
> >> > PMC
> >> > > >> has
> >> > > >> > a binding vote on the releases. [1] and this is similar to how
> >> > Apache
> >> > > >> > Superset does it too.
> >> > > >> >
> >> > > >> > [1]
> >> > > >> > https://www.apache.org/dev/pmc.html#what-is-a-pmc
> >> > > >> >
> >> > > >> >
> >> > > >> >
> >> > > >> > On Wed, 3 Apr 2024 at 00:05, Hussein Awala <hu...@awala.fr>
> >> > wrote:
> >> > > >> >
> >> > > >> > > I mentioned opting out just to confirm its importance, and
> >> after
> >> > > >> checking
> >> > > >> > > the Scarf documentation it appears to be supported natively
> by
> >> > > Scarf.
> >> > > >> For
> >> > > >> > > data accessibility, my point was more about raw data, not
> just
> >> > > >> aggregated
> >> > > >> > > information/insights shared via monthly newsletters, as we do
> >> for
> >> > > >> Airflow
> >> > > >> > > annual Survey for example:
> >> > > >> > > https://airflow.apache.org/survey vs
> >> > > >> > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> https://docs.google.com/forms/d/1wYm6c5Gn379zkg7zD7vcWB-1fCjnOocT0oZm-tjft_Q/viewanalytics
> >> > > >> > > .
> >> > > >> > >
> >> > > >> > > On Tue, Apr 2, 2024 at 2:43 PM Kaxil Naik <
> kaxilnaik@gmail.com
> >> >
> >> > > >> wrote:
> >> > > >> > >
> >> > > >> > > > Agreed to both your points Hussein but both the points are
> >> > already
> >> > > >> > > covered
> >> > > >> > > > in my original discussion post - both about opting out and
> >> > > providing
> >> > > >> > data
> >> > > >> > > > to all the PMC members and providing visibility via Monthly
> >> > > >> > newsletters.
> >> > > >> > > Is
> >> > > >> > > > there anything else you propose to discuss that isn’t
> >> covered?
> >> > > >> > > >
> >> > > >> > > >
> >> > > >> > > >
> >> > > >> > > > On Mon, 1 Apr 2024 at 13:21, Hussein Awala <
> hussein@awala.fr
> >> >
> >> > > >> wrote:
> >> > > >> > > >
> >> > > >> > > > > +1 for the idea in general, but there are two main points
> >> to
> >> > > >> discuss
> >> > > >> > > > before
> >> > > >> > > > > voting on this:
> >> > > >> > > > >
> >> > > >> > > > > 1. We should provide an option to disable Scarf:
> >> > > >> > > > > As Airflow is not a paid product, we cannot force
> >> companies to
> >> > > >> report
> >> > > >> > > > their
> >> > > >> > > > > use of this project. Otherwise, some may choose to create
> >> > their
> >> > > >> own
> >> > > >> > > fork
> >> > > >> > > > > just to disable Scarf.
> >> > > >> > > > >
> >> > > >> > > > > 2. Concerning the exclusivity of access to data:
> >> > > >> > > > > The data collected must either be completely proprietary
> >> for
> >> > use
> >> > > >> by
> >> > > >> > PMC
> >> > > >> > > > and
> >> > > >> > > > > ASF, or completely open. Since many companies offer
> Airflow
> >> > as a
> >> > > >> > > product,
> >> > > >> > > > > it is imperative not to give one company more privileges
> >> than
> >> > > >> > others. I
> >> > > >> > > > > raise this point for the principle of equality of
> >> opportunity.
> >> > > >> > > > >
> >> > > >> > > > > On Mon, Apr 1, 2024 at 12:35 PM Ankit Chaurasia <
> >> > > >> sunank200@gmail.com
> >> > > >> > >
> >> > > >> > > > > wrote:
> >> > > >> > > > >
> >> > > >> > > > > > Big +1 for Scarf.
> >> > > >> > > > > >
> >> > > >> > > > > > Transparency is key, so it's important to be super
> clear
> >> > about
> >> > > >> > opting
> >> > > >> > > > > > out and what's tracked to avoid spooking anyone about
> IP
> >> > > stuff.
> >> > > >> > > > > >
> >> > > >> > > > > > Regards
> >> > > >> > > > > > Ankit Chaurasia
> >> > > >> > > > > >
> >> > > >> > > > > >
> >> > > >> > > > > >
> >> > > >> > > > > >
> >> > > >> > > > > > On Mon, Apr 1, 2024 at 10:18 AM Amogh Desai <
> >> > > >> > > amoghdesai.oss@gmail.com>
> >> > > >> > > > > > wrote:
> >> > > >> > > > > > >
> >> > > >> > > > > > > +1 looks like a good tool which could be super
> helpful.
> >> > > >> > > > > > >
> >> > > >> > > > > > > * We should have some transparency into the data that
> >> is
> >> > > >> > collected
> >> > > >> > > or
> >> > > >> > > > > > sent
> >> > > >> > > > > > > * We should have an option to optionally opt-out
> >> > > >> > > > > > >
> >> > > >> > > > > > > Thanks & Regards,
> >> > > >> > > > > > > Amogh Desai
> >> > > >> > > > > > >
> >> > > >> > > > > > >
> >> > > >> > > > > > > On Sun, Mar 31, 2024 at 7:53 AM Wei Lee <
> >> > > weilee.rx@gmail.com>
> >> > > >> > > wrote:
> >> > > >> > > > > > >
> >> > > >> > > > > > > > +1 to this. It would be really useful. As long as
> we
> >> can
> >> > > opt
> >> > > >> > > out, I
> >> > > >> > > > > > think
> >> > > >> > > > > > > > we’re good.
> >> > > >> > > > > > > >
> >> > > >> > > > > > > > Best,
> >> > > >> > > > > > > > Wei
> >> > > >> > > > > > > >
> >> > > >> > > > > > > > > On Mar 31, 2024, at 12:47 AM, Kaxil Naik <
> >> > > >> > kaxilnaik@gmail.com>
> >> > > >> > > > > > wrote:
> >> > > >> > > > > > > > >
> >> > > >> > > > > > > > > Grammar Correction:
> >> > > >> > > > > > > > >
> >> > > >> > > > > > > > > We should assume that those who deploy and
> upgrade
> >> > > >> Airflow -
> >> > > >> > > > > actually
> >> > > >> > > > > > > > read
> >> > > >> > > > > > > > >> and take into account what is written in the
> >> release
> >> > > >> notes -
> >> > > >> > > > > > especially
> >> > > >> > > > > > > > if
> >> > > >> > > > > > > > >> they have security guys breathing their necks,
> >> > > similarly
> >> > > >> as
> >> > > >> > we
> >> > > >> > > > > have
> >> > > >> > > > > > to
> >> > > >> > > > > > > > >> assume they follow CVE announcements about
> >> security
> >> > > >> issues
> >> > > >> > > > fixed.
> >> > > >> > > > > > If we
> >> > > >> > > > > > > > >> are very straightforward and out-going about the
> >> > > change,
> >> > > >> > > inform
> >> > > >> > > > > very
> >> > > >> > > > > > > > >> clearly how to opt-out, I don't see a big
> problem
> >> > with
> >> > > >> > > opt-out.
> >> > > >> > > > > > > > >
> >> > > >> > > > > > > > >
> >> > > >> > > > > > > > > I couldn't agree more; even though we shouldn't
> >> > collect
> >> > > >> any
> >> > > >> > > data
> >> > > >> > > > > that
> >> > > >> > > > > > > > > hamper security (and we should aim to do the
> same),
> >> > most
> >> > > >> > > security
> >> > > >> > > > > > > > concerned
> >> > > >> > > > > > > > > folks don't just upgrade, and we can rely on them
> >> > > >> regarding
> >> > > >> > > > release
> >> > > >> > > > > > notes
> >> > > >> > > > > > > > > or announcements and we can make it very clear in
> >> our
> >> > > >> > > > announcements
> >> > > >> > > > > > too;
> >> > > >> > > > > > > > > and in our installation guides.
> >> > > >> > > > > > > > >
> >> > > >> > > > > > > > > On Sat, 30 Mar 2024 at 16:47, Kaxil Naik <
> >> > > >> > kaxilnaik@gmail.com>
> >> > > >> > > > > > wrote:
> >> > > >> > > > > > > > >
> >> > > >> > > > > > > > >> Grammar crrection:
> >> > > >> > > > > > > > >>
> >> > > >> > > > > > > > >>
> >> > > >> > > > > > > > >> On Sat, 30 Mar 2024 at 16:43, Kaxil Naik <
> >> > > >> > kaxilnaik@gmail.com
> >> > > >> > > >
> >> > > >> > > > > > wrote:
> >> > > >> > > > > > > > >>
> >> > > >> > > > > > > > >>> Have this at the end of the email too: but if
> >> folks
> >> > > >> don't
> >> > > >> > > read
> >> > > >> > > > > > until
> >> > > >> > > > > > > > the
> >> > > >> > > > > > > > >>> end and quoting Maxime from the use-case
> blog[1]:
> >> > > >> > > > > > > > >>>
> >> > > >> > > > > > > > >>> "I think people often ask ‘how do I contribute
> to
> >> > open
> >> > > >> > > > source?’,
> >> > > >> > > > > > ‘I've
> >> > > >> > > > > > > > >>> got to get into the code’, or ‘ I’ve got to be
> an
> >> > > >> > engineer.’
> >> > > >> > > > > > Actually,
> >> > > >> > > > > > > > the
> >> > > >> > > > > > > > >>> very simplest thing that you can do is just
> say,
> >> ‘my
> >> > > >> > > > organization
> >> > > >> > > > > > gets
> >> > > >> > > > > > > > real
> >> > > >> > > > > > > > >>> value from this piece of software.’ There are a
> >> > bunch
> >> > > of
> >> > > >> > ways
> >> > > >> > > > to
> >> > > >> > > > > > let
> >> > > >> > > > > > > > the
> >> > > >> > > > > > > > >>> people know about it – and now Scarf is there.
> If
> >> > your
> >> > > >> > > > > > organization is
> >> > > >> > > > > > > > >>> getting a lot of value from a piece of open
> >> source
> >> > > >> > software,
> >> > > >> > > > make
> >> > > >> > > > > > sure
> >> > > >> > > > > > > > the
> >> > > >> > > > > > > > >>> devs know about it."
> >> > > >> > > > > > > > >>>
> >> > > >> > > > > > > > >>> What kind of edge cases are you thinking
> about? I
> >> > > don't
> >> > > >> > think
> >> > > >> > > > it
> >> > > >> > > > > > makes
> >> > > >> > > > > > > > >>> sense to have "opt-in" at all. As the goal is
> to
> >> > > collect
> >> > > >> > data
> >> > > >> > > > for
> >> > > >> > > > > > most
> >> > > >> > > > > > > > >>> Airflow installations except for those that
> don't
> >> > want
> >> > > >> to
> >> > > >> > > give
> >> > > >> > > > > > data,
> >> > > >> > > > > > > > then
> >> > > >> > > > > > > > >>> "opt-out" is the only way to maximize it. As
> >> long as
> >> > > we
> >> > > >> > don't
> >> > > >> > > > > > collect
> >> > > >> > > > > > > > any
> >> > > >> > > > > > > > >>> PII data, this is in-compliance as well.
> >> > > >> > > > > > > > >>>
> >> > > >> > > > > > > > >>> Imagine someone learning Airflow, if they have
> to
> >> > > opt-in
> >> > > >> > via
> >> > > >> > > a
> >> > > >> > > > > > config,
> >> > > >> > > > > > > > >>> they wouldn't even know or care about it, hence
> >> us
> >> > > >> losing
> >> > > >> > > most
> >> > > >> > > > of
> >> > > >> > > > > > the
> >> > > >> > > > > > > > data.
> >> > > >> > > > > > > > >>> I understand why some orgs & individuals may
> >> want to
> >> > > >> > opt-out.
> >> > > >> > > > > > > > >>>
> >> > > >> > > > > > > > >>> Scarf Provides tracking pixels (essentially an
> >> HTML
> >> > > >> image
> >> > > >> > > tag)
> >> > > >> > > > > > that you
> >> > > >> > > > > > > > >>> can place in your website or product to track
> >> > visitors
> >> > > >> to
> >> > > >> > > that
> >> > > >> > > > > > URL. If
> >> > > >> > > > > > > > >>> there were any concerns about Privacy, ASF
> >> wouldn't
> >> > > have
> >> > > >> > > > approved
> >> > > >> > > > > > it
> >> > > >> > > > > > > > at all.
> >> > > >> > > > > > > > >>>
> >> > > >> > > > > > > > >>> A few key details to note about the pixel:
> >> > > >> > > > > > > > >>>
> >> > > >> > > > > > > > >>>
> >> > > >> > > > > > > > >>>   - No PII is tracked… Scarf does not
> >> capture/retain
> >> > > IP
> >> > > >> > > > > > information…
> >> > > >> > > > > > > > >>>   this information is discarded by the platform
> >> upon
> >> > > >> > > > > > > > processing/aggregating
> >> > > >> > > > > > > > >>>   - Scarf pixels respect the Do Not Track (DNT)
> >> > > >> settings of
> >> > > >> > > > > > browsers -
> >> > > >> > > > > > > > >>>   these users will not be tracked whatsoever.
> >> > > >> > > > > > > > >>>
> >> > > >> > > > > > > > >>>
> >> > > >> > > > > > > > >>> All the ASF projects I had listed (whether they
> >> use
> >> > > >> Scarf
> >> > > >> > > > gateway
> >> > > >> > > > > > or
> >> > > >> > > > > > > > >>> Scarf pixel in product) are using opt-out.
> >> > > >> > > > > > > > >>>
> >> > > >> > > > > > > > >>> 1. Short opt-in period before opt-out. Test
> this
> >> > > feature
> >> > > >> > with
> >> > > >> > > > > > users who
> >> > > >> > > > > > > > >>>> trust and if it works great - make it public.
> I
> >> > think
> >> > > >> it's
> >> > > >> > > > wise
> >> > > >> > > > > to
> >> > > >> > > > > > > > handle
> >> > > >> > > > > > > > >>>> edge cases and configure collected data more
> >> > > >> accurately.
> >> > > >> > > > > > > > >>>
> >> > > >> > > > > > > > >>>
> >> > > >> > > > > > > > >>>
> >> > > >> > > > > > > > >>> It would be a pixel in the webserver, should
> >> affect
> >> > > >> nothing
> >> > > >> > > at
> >> > > >> > > > > all
> >> > > >> > > > > > even
> >> > > >> > > > > > > > >>> in an air-gapped environment.
> >> > > >> > > > > > > > >>>
> >> > > >> > > > > > > > >>>> 2. It should not affect anything if access to
> >> the
> >> > > >> internet
> >> > > >> > > is
> >> > > >> > > > > > > > restricted
> >> > > >> > > > > > > > >>>> which is default for many companies.
> >> > > >> > > > > > > > >>>
> >> > > >> > > > > > > > >>>
> >> > > >> > > > > > > > >>>
> >> > > >> > > > > > > > >>> 100% agreed on the below:
> >> > > >> > > > > > > > >>>
> >> > > >> > > > > > > > >>>> I think we have a very good blueprint to
> follow
> >> > > >> including
> >> > > >> > at
> >> > > >> > > > > > least 5
> >> > > >> > > > > > > > >>>> other
> >> > > >> > > > > > > > >>>> ASF projects that also passed the review of
> the
> >> > > >> > privacy@asf.
> >> > > >> > > > > And
> >> > > >> > > > > > > > while I
> >> > > >> > > > > > > > >>>> understand (and concur) the urge for opt-in by
> >> > > default
> >> > > >> > > coming
> >> > > >> > > > > from
> >> > > >> > > > > > > > >>>> consumer
> >> > > >> > > > > > > > >>>> market (where it makes perfect sense) Airflow
> is
> >> > not
> >> > > a
> >> > > >> > > > consumer
> >> > > >> > > > > > > > >>>> software and is used in "corporate
> environment"
> >> > which
> >> > > >> has
> >> > > >> > a
> >> > > >> > > > > little
> >> > > >> > > > > > > > >>>> different expectations and broad assumption
> that
> >> > the
> >> > > >> > company
> >> > > >> > > > can
> >> > > >> > > > > > make
> >> > > >> > > > > > > > >>>> decisions on such telemetry on behalf of the
> >> > > employees
> >> > > >> > using
> >> > > >> > > > it.
> >> > > >> > > > > > > > >>>
> >> > > >> > > > > > > > >>>
> >> > > >> > > > > > > > >>> Couldn't agree more; even though there
> shouldn't
> >> we
> >> > > >> collect
> >> > > >> > > > > hamper
> >> > > >> > > > > > > > >>> security (and we should aim to do the same),
> most
> >> > > >> security
> >> > > >> > > > > > concerned
> >> > > >> > > > > > > > folks
> >> > > >> > > > > > > > >>> don't just
> >> > > >> > > > > > > > >>> upgrade, and we can rely on them regarding
> >> release
> >> > > >> notes or
> >> > > >> > > > > > > > announcements
> >> > > >> > > > > > > > >>> and we can make it very clear in our
> >> announcements
> >> > > too;
> >> > > >> and
> >> > > >> > > in
> >> > > >> > > > > our
> >> > > >> > > > > > > > >>> installation guides.
> >> > > >> > > > > > > > >>>
> >> > > >> > > > > > > > >>> We should assume that those who deploy and
> >> upgrade
> >> > > >> Airflow
> >> > > >> > -
> >> > > >> > > > > > actually
> >> > > >> > > > > > > > read
> >> > > >> > > > > > > > >>>> and take into account what is written in the
> >> > release
> >> > > >> > notes -
> >> > > >> > > > > > > > especially
> >> > > >> > > > > > > > >>>> if
> >> > > >> > > > > > > > >>>> they have security guys breathing their necks,
> >> > > >> similarly
> >> > > >> > as
> >> > > >> > > we
> >> > > >> > > > > > have to
> >> > > >> > > > > > > > >>>> assume they follow CVE announcements about
> >> security
> >> > > >> issues
> >> > > >> > > > > fixed.
> >> > > >> > > > > > If
> >> > > >> > > > > > > > we
> >> > > >> > > > > > > > >>>> are very straightforward and out-going about
> the
> >> > > >> change,
> >> > > >> > > > inform
> >> > > >> > > > > > very
> >> > > >> > > > > > > > >>>> clearly how to opt-out, I don't see a big
> >> problem
> >> > > with
> >> > > >> > > > opt-out.
> >> > > >> > > > > > > > >>>
> >> > > >> > > > > > > > >>>
> >> > > >> > > > > > > > >>>
> >> > > >> > > > > > > > >>> To be clear, the collection of data, or at
> least
> >> the
> >> > > >> data
> >> > > >> > we
> >> > > >> > > > > should
> >> > > >> > > > > > > > >>> gather here should help all the consumers
> without
> >> > > >> violating
> >> > > >> > > > > > anything
> >> > > >> > > > > > > > >>> regulations. I will quote Maxime's quote in the
> >> > > use-case
> >> > > >> > doc
> >> > > >> > > > [1]
> >> > > >> > > > > > > > >>>
> >> > > >> > > > > > > > >>> "*Another Form of Contributing*
> >> > > >> > > > > > > > >>> “I think people often ask ‘how do I contribute
> to
> >> > open
> >> > > >> > > > source?’,
> >> > > >> > > > > > ‘I've
> >> > > >> > > > > > > > >>> got to get into the code’, or ‘ I’ve got to be
> an
> >> > > >> > engineer.’
> >> > > >> > > > > > Actually,
> >> > > >> > > > > > > > the
> >> > > >> > > > > > > > >>> very simplest thing that you can do is just
> say,
> >> ‘my
> >> > > >> > > > organization
> >> > > >> > > > > > gets
> >> > > >> > > > > > > > real
> >> > > >> > > > > > > > >>> value from this piece of software.’ There are a
> >> > bunch
> >> > > of
> >> > > >> > ways
> >> > > >> > > > to
> >> > > >> > > > > > let
> >> > > >> > > > > > > > the
> >> > > >> > > > > > > > >>> people know about it – and now Scarf is there.
> If
> >> > your
> >> > > >> > > > > > organization is
> >> > > >> > > > > > > > >>> getting a lot of value from a piece of open
> >> source
> >> > > >> > software,
> >> > > >> > > > make
> >> > > >> > > > > > sure
> >> > > >> > > > > > > > the
> >> > > >> > > > > > > > >>> devs know about it.”"
> >> > > >> > > > > > > > >>>
> >> > > >> > > > > > > > >>>
> >> > > >> > > > > > > > >>> [1]
> >> > > >> > > >
> https://about.scarf.sh/post/scarf-case-study-apache-superset
> >> > > >> > > > > > > > >>>
> >> > > >> > > > > > > > >>> On Sat, 30 Mar 2024 at 14:02, Alexander Shorin
> <
> >> > > >> > > > > kxepal@apache.org>
> >> > > >> > > > > > > > wrote:
> >> > > >> > > > > > > > >>>
> >> > > >> > > > > > > > >>>> Hi Jarek!
> >> > > >> > > > > > > > >>>>
> >> > > >> > > > > > > > >>>> I understand the reasons for opt-out from a
> >> project
> >> > > >> view.
> >> > > >> > I
> >> > > >> > > > just
> >> > > >> > > > > > > > suddenly
> >> > > >> > > > > > > > >>>> imagined the situation when an upgrade happens
> >> and
> >> > > here
> >> > > >> > > comes
> >> > > >> > > > > the
> >> > > >> > > > > > > > data to
> >> > > >> > > > > > > > >>>> some third party service - that's a view from
> a
> >> > user
> >> > > >> side
> >> > > >> > of
> >> > > >> > > > > some
> >> > > >> > > > > > big
> >> > > >> > > > > > > > >>>> company.
> >> > > >> > > > > > > > >>>>
> >> > > >> > > > > > > > >>>> There could be good alternatives to handle
> this:
> >> > > >> > > > > > > > >>>> 1. Short opt-in period before opt-out. Test
> this
> >> > > >> feature
> >> > > >> > > with
> >> > > >> > > > > > users
> >> > > >> > > > > > > > who
> >> > > >> > > > > > > > >>>> trust and if it works great - make it public.
> I
> >> > think
> >> > > >> it's
> >> > > >> > > > wise
> >> > > >> > > > > to
> >> > > >> > > > > > > > handle
> >> > > >> > > > > > > > >>>> edge cases and configure collected data more
> >> > > >> accurately.
> >> > > >> > > > > > > > >>>> 2. Explicitly somehow warn about this feature
> to
> >> > make
> >> > > >> this
> >> > > >> > > > > > feature not
> >> > > >> > > > > > > > >>>> get
> >> > > >> > > > > > > > >>>> unnoticed. Just to reduce possible
> frustration.
> >> > > >> > > > > > > > >>>>
> >> > > >> > > > > > > > >>>> Just a personal thoughts for discussion (:
> >> > > >> > > > > > > > >>>>
> >> > > >> > > > > > > > >>>> --
> >> > > >> > > > > > > > >>>> ,,,^..^,,,
> >> > > >> > > > > > > > >>>>
> >> > > >> > > > > > > > >>>> On Sat, Mar 30, 2024 at 4:36 PM Jarek Potiuk <
> >> > > >> > > > jarek@potiuk.com>
> >> > > >> > > > > > > > wrote:
> >> > > >> > > > > > > > >>>>
> >> > > >> > > > > > > > >>>>> Hello everyone,
> >> > > >> > > > > > > > >>>>>
> >> > > >> > > > > > > > >>>>> it has to be:
> >> > > >> > > > > > > > >>>>>
> >> > > >> > > > > > > > >>>>> 1. Opt-in by default to not trigger security
> >> guys
> >> > > >> about
> >> > > >> > new
> >> > > >> > > > > > unplanned
> >> > > >> > > > > > > > >>>>>> activity after regular upgrade.
> >> > > >> > > > > > > > >>>>>>
> >> > > >> > > > > > > > >>>>>
> >> > > >> > > > > > > > >>>>> That's a very good point about security
> >> triggering
> >> > > >> > > Alexander,
> >> > > >> > > > > > but I
> >> > > >> > > > > > > > am
> >> > > >> > > > > > > > >>>> not
> >> > > >> > > > > > > > >>>>> so sure it means that we "have to" do opt-in.
> >> > There
> >> > > >> are
> >> > > >> > > other
> >> > > >> > > > > > ways of
> >> > > >> > > > > > > > >>>>> communicating with the "deployment managers"
> >> who
> >> > > >> install
> >> > > >> > > and
> >> > > >> > > > > > upgrade
> >> > > >> > > > > > > > >>>>> airflow - i.e. release notes. blogs, social
> >> media
> >> > of
> >> > > >> > ours,
> >> > > >> > > > > slack
> >> > > >> > > > > > > > >>>>> announcements etc. We have plenty of channels
> >> we
> >> > can
> >> > > >> use
> >> > > >> > to
> >> > > >> > > > > > > > >>>> communicate the
> >> > > >> > > > > > > > >>>>> change.
> >> > > >> > > > > > > > >>>>>
> >> > > >> > > > > > > > >>>>> I think we have a very good blueprint to
> follow
> >> > > >> including
> >> > > >> > > at
> >> > > >> > > > > > least 5
> >> > > >> > > > > > > > >>>> other
> >> > > >> > > > > > > > >>>>> ASF projects that also passed the review of
> the
> >> > > >> > > privacy@asf.
> >> > > >> > > > > And
> >> > > >> > > > > > > > >>>> while I
> >> > > >> > > > > > > > >>>>> understand (and concur) the urge for opt-in
> by
> >> > > default
> >> > > >> > > coming
> >> > > >> > > > > > from
> >> > > >> > > > > > > > >>>> consumer
> >> > > >> > > > > > > > >>>>> market (where it makes perfect sense) Airflow
> >> is
> >> > > not a
> >> > > >> > > > consumer
> >> > > >> > > > > > > > >>>>> software and is used in "corporate
> environment"
> >> > > which
> >> > > >> > has a
> >> > > >> > > > > > little
> >> > > >> > > > > > > > >>>>> different expectations and broad assumption
> >> that
> >> > the
> >> > > >> > > company
> >> > > >> > > > > can
> >> > > >> > > > > > make
> >> > > >> > > > > > > > >>>>> decisions on such telemetry on behalf of the
> >> > > employees
> >> > > >> > > using
> >> > > >> > > > > it.
> >> > > >> > > > > > > > >>>>>
> >> > > >> > > > > > > > >>>>> We should assume that those who deploy and
> >> upgrade
> >> > > >> > Airflow
> >> > > >> > > -
> >> > > >> > > > > > actually
> >> > > >> > > > > > > > >>>> read
> >> > > >> > > > > > > > >>>>> and take into account what is written in the
> >> > release
> >> > > >> > notes
> >> > > >> > > -
> >> > > >> > > > > > > > >>>> especially if
> >> > > >> > > > > > > > >>>>> they have security guys breathing their
> necks,
> >> > > >> similarly
> >> > > >> > as
> >> > > >> > > > we
> >> > > >> > > > > > have
> >> > > >> > > > > > > > to
> >> > > >> > > > > > > > >>>>> assume they follow CVE announcements about
> >> > security
> >> > > >> > issues
> >> > > >> > > > > > fixed. If
> >> > > >> > > > > > > > we
> >> > > >> > > > > > > > >>>>> are very straightforward and out-going about
> >> the
> >> > > >> change,
> >> > > >> > > > inform
> >> > > >> > > > > > very
> >> > > >> > > > > > > > >>>>> clearly how to opt-out, I don't see a big
> >> problem
> >> > > with
> >> > > >> > > > opt-out.
> >> > > >> > > > > > > > >>>>>
> >> > > >> > > > > > > > >>>>> We should of course check with privacy@a.o
> >> (but
> >> > I'v
> >> > > >> > spend
> >> > > >> > > a
> >> > > >> > > > > good
> >> > > >> > > > > > > > deal
> >> > > >> > > > > > > > >>>> of
> >> > > >> > > > > > > > >>>>> time reading the Superset  and other use case
> >> and
> >> > > >> > > explanation
> >> > > >> > > > > in
> >> > > >> > > > > > > > >>>> detail to
> >> > > >> > > > > > > > >>>>> make a better informed decision) - and it
> looks
> >> > like
> >> > > >> they
> >> > > >> > > > also
> >> > > >> > > > > > went
> >> > > >> > > > > > > > >>>> opt-out
> >> > > >> > > > > > > > >>>>> way and got cleared by privacy@a.o.  And if
> we
> >> > > cannot
> >> > > >> > > reach
> >> > > >> > > > > > > > >>>> consensus, we
> >> > > >> > > > > > > > >>>>> should - as usual - make a voting decision on
> >> it
> >> > > >> (because
> >> > > >> > > > yes,
> >> > > >> > > > > > it is
> >> > > >> > > > > > > > an
> >> > > >> > > > > > > > >>>>> important decision), but - after reading and
> >> > > >> > understanding
> >> > > >> > > > why
> >> > > >> > > > > > others
> >> > > >> > > > > > > > >>>> also
> >> > > >> > > > > > > > >>>>> did it - for me personally, opt-out is a good
> >> > path.
> >> > > >> > > > > > > > >>>>>
> >> > > >> > > > > > > > >>>>> Also because it will rather increase the
> >> amount of
> >> > > >> data
> >> > > >> > to
> >> > > >> > > > > > gather,
> >> > > >> > > > > > > > and
> >> > > >> > > > > > > > >>>> in
> >> > > >> > > > > > > > >>>>> our case - counter intuitively - it will be
> >> even
> >> > > >> better
> >> > > >> > for
> >> > > >> > > > > > privacy
> >> > > >> > > > > > > > and
> >> > > >> > > > > > > > >>>>> corporate anonymity, because the more data we
> >> get,
> >> > > the
> >> > > >> > more
> >> > > >> > > > > > difficult
> >> > > >> > > > > > > > >>>> it
> >> > > >> > > > > > > > >>>>> will be to get any
> >> non-statistical/non-aggregated
> >> > > >> insight
> >> > > >> > > > from
> >> > > >> > > > > > it.
> >> > > >> > > > > > > > >>>> Imagine
> >> > > >> > > > > > > > >>>>> if only a few corporate users will enable it
> >> > > >> consciously
> >> > > >> > -
> >> > > >> > > > then
> >> > > >> > > > > > we
> >> > > >> > > > > > > > >>>> will be
> >> > > >> > > > > > > > >>>>> able to draw much more conclusions if we find
> >> out
> >> > > who
> >> > > >> > they
> >> > > >> > > > are,
> >> > > >> > > > > > than
> >> > > >> > > > > > > > if
> >> > > >> > > > > > > > >>>>> everyone has it enabled by default.
> >> > > >> > > > > > > > >>>>>
> >> > > >> > > > > > > > >>>>> That's my take on it - but again, it's up to
> >> us to
> >> > > >> vote,
> >> > > >> > > for
> >> > > >> > > > me
> >> > > >> > > > > > > > opt-in
> >> > > >> > > > > > > > >>>> is
> >> > > >> > > > > > > > >>>>> not "has to", and I am rather for opt-out.
> >> > > >> > > > > > > > >>>>>
> >> > > >> > > > > > > > >>>>> J.
> >> > > >> > > > > > > > >>>>>
> >> > > >> > > > > > > > >>>>>> Hi all,
> >> > > >> > > > > > > > >>>>>>
> >> > > >> > > > > > > > >>>>>>
> >> > > >> > > > > > > > >>>>>>> I want to propose gathering telemetry for
> >> > Airflow
> >> > > >> > > > > > installations.
> >> > > >> > > > > > > > >>>> As the
> >> > > >> > > > > > > > >>>>>>> Airflow community, we have been relying
> >> heavily
> >> > on
> >> > > >> the
> >> > > >> > > > yearly
> >> > > >> > > > > > > > >>>> Airflow
> >> > > >> > > > > > > > >>>>>>> Survey and anecdotes to answer a few key
> >> > questions
> >> > > >> > about
> >> > > >> > > > > > Airflow
> >> > > >> > > > > > > > >>>> usage.
> >> > > >> > > > > > > > >>>>>>> Questions like the following:
> >> > > >> > > > > > > > >>>>>>>
> >> > > >> > > > > > > > >>>>>>>
> >> > > >> > > > > > > > >>>>>>>   - Which versions of Airflow are people
> >> > > >> > installing/using
> >> > > >> > > > now
> >> > > >> > > > > > > > >>>> (i.e.
> >> > > >> > > > > > > > >>>>>>>   whether people have primarily made the
> jump
> >> > from
> >> > > >> > > version
> >> > > >> > > > X
> >> > > >> > > > > to
> >> > > >> > > > > > > > >>>>> version
> >> > > >> > > > > > > > >>>>>> Y)
> >> > > >> > > > > > > > >>>>>>>   - Which DB is used as the Metadata DB and
> >> > which
> >> > > >> > version
> >> > > >> > > > e.g
> >> > > >> > > > > > Pg
> >> > > >> > > > > > > > >>>> 14?
> >> > > >> > > > > > > > >>>>>>>   - What Python version is being used?
> >> > > >> > > > > > > > >>>>>>>   - Which Executor is being used?
> >> > > >> > > > > > > > >>>>>>>   - Approximately how many people out there
> >> in
> >> > the
> >> > > >> > world
> >> > > >> > > > are
> >> > > >> > > > > > > > >>>>> installing
> >> > > >> > > > > > > > >>>>>>>   Airflow
> >> > > >> > > > > > > > >>>>>>>
> >> > > >> > > > > > > > >>>>>>>
> >> > > >> > > > > > > > >>>>>>> There is a solution that should help answer
> >> > these
> >> > > >> > > > questions:
> >> > > >> > > > > > Scarf
> >> > > >> > > > > > > > >>>> [1].
> >> > > >> > > > > > > > >>>>>> The
> >> > > >> > > > > > > > >>>>>>> ASF already approves Scarf [2][3] and is
> >> already
> >> > > >> used
> >> > > >> > by
> >> > > >> > > > > other
> >> > > >> > > > > > ASF
> >> > > >> > > > > > > > >>>>>>> projects: Superset [4], Dolphin Scheduler
> >> [5],
> >> > > Dubbo
> >> > > >> > > > > > Kubernetes,
> >> > > >> > > > > > > > >>>>> DevLake,
> >> > > >> > > > > > > > >>>>>>> Skywalking as it follows GDPR and other
> >> > > regulations.
> >> > > >> > > > > > > > >>>>>>>
> >> > > >> > > > > > > > >>>>>>> Similar to Superset, we probably can use it
> >> as
> >> > > >> follows:
> >> > > >> > > > > > > > >>>>>>>
> >> > > >> > > > > > > > >>>>>>>
> >> > > >> > > > > > > > >>>>>>>   1. Install the `scarf js` npm package and
> >> > bundle
> >> > > >> it
> >> > > >> > in
> >> > > >> > > > the
> >> > > >> > > > > > > > >>>>> Webserver.
> >> > > >> > > > > > > > >>>>>>>   When the package is downloaded & Airflow
> >> > > >> webserver is
> >> > > >> > > > > opened,
> >> > > >> > > > > > > > >>>>> metadata
> >> > > >> > > > > > > > >>>>>>> is
> >> > > >> > > > > > > > >>>>>>>   recorded to the Scarf dashboard.
> >> > > >> > > > > > > > >>>>>>>   2. Utilize the Scarf Gateway [6], which
> we
> >> can
> >> > > >> use in
> >> > > >> > > > front
> >> > > >> > > > > > of
> >> > > >> > > > > > > > >>>>> docker
> >> > > >> > > > > > > > >>>>>>>   containers. While it’s possible people go
> >> > around
> >> > > >> this
> >> > > >> > > > > > gateway,
> >> > > >> > > > > > > > >>>> we
> >> > > >> > > > > > > > >>>>> can
> >> > > >> > > > > > > > >>>>>>>   probably configure and encourage most
> >> traffic
> >> > to
> >> > > >> go
> >> > > >> > > > through
> >> > > >> > > > > > > > >>>> these
> >> > > >> > > > > > > > >>>>>>> gateways.
> >> > > >> > > > > > > > >>>>>>>
> >> > > >> > > > > > > > >>>>>>> While Scarf does not store any personally
> >> > > >> identifying
> >> > > >> > > > > > information
> >> > > >> > > > > > > > >>>> from
> >> > > >> > > > > > > > >>>>>> SDK
> >> > > >> > > > > > > > >>>>>>> telemetry data, it does send various bits
> of
> >> > > >> IP-derived
> >> > > >> > > > > > > > >>>> information as
> >> > > >> > > > > > > > >>>>>>> outlined here [7]. This data should be made
> >> as
> >> > > >> > > transparent
> >> > > >> > > > as
> >> > > >> > > > > > > > >>>> possible
> >> > > >> > > > > > > > >>>>> by
> >> > > >> > > > > > > > >>>>>>> granting dashboard access to the Airflow
> PMC
> >> and
> >> > > any
> >> > > >> > > other
> >> > > >> > > > > > relevant
> >> > > >> > > > > > > > >>>>> means
> >> > > >> > > > > > > > >>>>>>> of sharing/surfacing it that we encounter
> >> (Town
> >> > > >> Hall,
> >> > > >> > > > Slack,
> >> > > >> > > > > > > > >>>> Newsletter
> >> > > >> > > > > > > > >>>>>>> etc).
> >> > > >> > > > > > > > >>>>>>>
> >> > > >> > > > > > > > >>>>>>> The following case studies are worth
> reading:
> >> > > >> > > > > > > > >>>>>>>
> >> > > >> > > > > > > > >>>>>>>   1.
> >> > > >> > > > > >
> >> > https://about.scarf.sh/post/scarf-case-study-apache-superset
> >> > > >> > > > > > > > >>>>> (From
> >> > > >> > > > > > > > >>>>>>>   Maxime)
> >> > > >> > > > > > > > >>>>>>>   2.
> >> > > >> > > > > > > > >>>>>>>
> >> > > >> > > > > > > > >>>>>>>
> >> > > >> > > > > > > > >>>>>>
> >> > > >> > > > > > > > >>>>>
> >> > > >> > > > > > > > >>>>
> >> > > >> > > > > > > >
> >> > > >> > > > > >
> >> > > >> > > > >
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> https://about.scarf.sh/post/haskell-org-bridging-the-gap-between-language-innovation-and-community-understanding
> >> > > >> > > > > > > > >>>>>>>
> >> > > >> > > > > > > > >>>>>>> Similar to them, this could help in various
> >> ways
> >> > > >> that
> >> > > >> > > come
> >> > > >> > > > > with
> >> > > >> > > > > > > > >>>> using
> >> > > >> > > > > > > > >>>>>> data
> >> > > >> > > > > > > > >>>>>>> for decision-making. With clear guidelines
> on
> >> > "how
> >> > > >> to
> >> > > >> > > > > opt-out"
> >> > > >> > > > > > > > >>>>>> [8][9][10] &
> >> > > >> > > > > > > > >>>>>>> "what data is being collected" on the
> Airflow
> >> > > >> website,
> >> > > >> > > this
> >> > > >> > > > > > can be
> >> > > >> > > > > > > > >>>>>>> beneficial to the entire community as we
> >> would
> >> > be
> >> > > >> > making
> >> > > >> > > > more
> >> > > >> > > > > > > > >>>> informed
> >> > > >> > > > > > > > >>>>>>> decisions.
> >> > > >> > > > > > > > >>>>>>>
> >> > > >> > > > > > > > >>>>>>> Regards,
> >> > > >> > > > > > > > >>>>>>> Kaxil
> >> > > >> > > > > > > > >>>>>>>
> >> > > >> > > > > > > > >>>>>>>
> >> > > >> > > > > > > > >>>>>>> [1] https://about.scarf.sh/
> >> > > >> > > > > > > > >>>>>>> [2]
> >> > > >> > > > > >
> >> > > https://privacy.apache.org/policies/privacy-policy-public.html
> >> > > >> > > > > > > > >>>>>>> [3]
> >> > > https://privacy.apache.org/faq/committers.html
> >> > > >> > > > > > > > >>>>>>> [4]
> >> > > https://github.com/apache/superset/issues/25639
> >> > > >> > > > > > > > >>>>>>> [5]
> >> > > >> > > > > > > > >>>>>>>
> >> > > >> > > > > > > > >>>>>>>
> >> > > >> > > > > > > > >>>>>>
> >> > > >> > > > > > > > >>>>>
> >> > > >> > > > > > > > >>>>
> >> > > >> > > > > > > >
> >> > > >> > > > > >
> >> > > >> > > > >
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> https://github.com/search?q=repo%3Aapache%2Fdolphinscheduler%20scarf.sh&type=code
> >> > > >> > > > > > > > >>>>>>> [6] https://about.scarf.sh/scarf-gateway
> >> > > >> > > > > > > > >>>>>>> [7] https://about.scarf.sh/privacy-policy
> >> > > >> > > > > > > > >>>>>>> [8]
> >> > > >> > > > > > > > >>>>>>>
> >> > > >> > > > > > > > >>>>>>>
> >> > > >> > > > > > > > >>>>>>
> >> > > >> > > > > > > > >>>>>
> >> > > >> > > > > > > > >>>>
> >> > > >> > > > > > > >
> >> > > >> > > > > >
> >> > > >> > > > >
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> https://superset.apache.org/docs/frequently-asked-questions/#does-superset-collect-any-telemetry-data
> >> > > >> > > > > > > > >>>>>>> [9]
> >> > > >> > > > > > > > >>>>>>>
> >> > > >> > > > > > > > >>>>>>>
> >> > > >> > > > > > > > >>>>>>
> >> > > >> > > > > > > > >>>>>
> >> > > >> > > > > > > > >>>>
> >> > > >> > > > > > > >
> >> > > >> > > > > >
> >> > > >> > > > >
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> https://superset.apache.org/docs/installation/installing-superset-using-docker-compose
> >> > > >> > > > > > > > >>>>>>> [10]
> >> > > >> > > > > > > > >>>>>>>
> >> > > >> > > > > > > > >>>>>>>
> >> > > >> > > > > > > > >>>>>>
> >> > > >> > > > > > > > >>>>>
> >> > > >> > > > > > > > >>>>
> >> > > >> > > > > > > >
> >> > > >> > > > > >
> >> > > >> > > > >
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics
> >> > > >> > > > > > > > >>>>>>>
> >> > > >> > > > > > > > >>>>>>
> >> > > >> > > > > > > > >>>>>
> >> > > >> > > > > > > > >>>>
> >> > > >> > > > > > > > >>>
> >> > > >> > > > > > > >
> >> > > >> > > > > > > >
> >> > > >> > > > > > > >
> >> > > >> > > >
> >> > > >>
> >> ---------------------------------------------------------------------
> >> > > >> > > > > > > > To unsubscribe, e-mail:
> >> > > dev-unsubscribe@airflow.apache.org
> >> > > >> > > > > > > > For additional commands, e-mail:
> >> > > >> dev-help@airflow.apache.org
> >> > > >> > > > > > > >
> >> > > >> > > > > > > >
> >> > > >> > > > > >
> >> > > >> > > > > >
> >> > > >> >
> >> > ---------------------------------------------------------------------
> >> > > >> > > > > > To unsubscribe, e-mail:
> >> dev-unsubscribe@airflow.apache.org
> >> > > >> > > > > > For additional commands, e-mail:
> >> > dev-help@airflow.apache.org
> >> > > >> > > > > >
> >> > > >> > > > > >
> >> > > >> > > > >
> >> > > >> > > >
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > > >
> >> > >
> >> >
> >>
> >
>

Re: [DISCUSS] Proposal for adding Telemetry via Scarf

Posted by Kaxil Naik <ka...@gmail.com>.
The webserver is packaged after compiling, so that won't be possible Michal.

On Tue, 9 Apr 2024 at 11:02, Michał Modras <mi...@google.com> wrote:

> If it is packaged and installed by default, we add the dependency (and its
> dependencies) to Airflow's already-not-small dependency tree. If we make it
> installed and enabled by default, would there be an easy way to not just
> switch it off (e.g. through the env variable), but also not package it at
> all? That's why I was suggesting a provider, but actually any other
> pluggable (and unpluggable) mechanism would work.
>
> On Tue, Apr 9, 2024 at 2:41 AM Hussein Awala <hu...@awala.fr> wrote:
>
>> > Other than that I don't mind it being e.g. optional provider.
>>
>> I don't think it is possible to implement it in a provider because it is a
>> js package installed on the webserver; we could implement it as a plugin
>> (Blueprint), but in this case, the user must make an effort to register
>> it.
>>
>> It would be better to always install it, and activate it by default, with
>> the possibility of deactivating it via the environment variable
>> `SCARF_ANALYTICS=false` (according to the documentation), where if it is
>> deactivated by default, many users will not activate it even if they don't
>> mind to report the metrics, but if we enable it by default, only users who
>> don't want to send metrics will disable it.
>>
>>
>> On Fri, Apr 5, 2024 at 6:19 PM Michał Modras
>> <mi...@google.com.invalid> wrote:
>>
>> > My 2 cents: it must be possible to opt-out, preferably it should be
>> > possible to deploy Airflow instances without bundling the telemetry
>> library
>> > dependencies. Other than that I don't mind it being e.g. optional
>> provider.
>> >
>> > śr., 3 kwi 2024, 22:42 użytkownik Hussein Awala <hu...@awala.fr>
>> > napisał:
>> >
>> > > > I'd like to propose, that we start with collecting simple data with
>> > > limited access: to all the PMC members. We can always expand it to
>> > > Committers and then expand further to make it invite-only or setup
>> > > exporting it to a DB like Postgres
>> > > <https://github.com/scarf-sh/scarf-postgres-exporter> and have a
>> > publicly
>> > > viewable dashboard.
>> > >
>> > > Looks like a good plan; we can discuss the export format when we
>> decide
>> > to
>> > > do it.
>> > >
>> > > On Wed, Apr 3, 2024 at 7:59 PM Kaxil Naik <ka...@gmail.com>
>> wrote:
>> > >
>> > > > Yup, exactly.
>> > > >
>> > > > I believe this would definitely help us take early and informed
>> > > decisions.
>> > > >> E.g. Had we had this earlier, I believe it would have definitely
>> > helped
>> > > us
>> > > >> more for our past discussions like whether we should continue
>> > supporting
>> > > >> MsSQL(
>> > https://lists.apache.org/thread/r06j306hldg03g2my1pd4nyjxg78b3h4
>> > > ),
>> > > >> similarly about the DaskExecutor (
>> > > >> https://lists.apache.org/thread/ptwjf5g87lyl5476krt91bzfrm96pnb1),
>> > etc.
>> > > >>
>> > > >
>> > > >
>> > > > Btw clarifying my own stance on the below; and let me know what you
>> > > think @Hussein
>> > > > Awala <hu...@awala.fr> : I'd like to propose, that we start with
>> > > > collecting simple data with limited access: to all the PMC members.
>> We
>> > > can
>> > > > always expand it to Committers and then expand further to make it
>> > > > invite-only or setup exporting it to a DB like Postgres
>> > > > <https://github.com/scarf-sh/scarf-postgres-exporter> and have a
>> > > publicly
>> > > > viewable dashboard. It would be similar to an iterative software
>> > > > development approach, since this will be the first time for us, as
>> > > Airflow
>> > > > PMC, to add such telemetry. This is of course just my opinion
>> though :)
>> > > >
>> > > > Regarding the data, like I had mentioned in the email and I am glad
>> > > others
>> > > >> including you are on the same page that the data will be shared
>> with
>> > all
>> > > >> PMC members. The point about sharing it via website and newsletter
>> was
>> > > for
>> > > >> the community — Airflow users. I don’t think anyone in the
>> community
>> > > (apart
>> > > >> from the PMC members) would need raw data. And even if they need
>> it,
>> > I’d
>> > > >> say they should put effort and contribute to the Airflow project
>> and
>> > > become
>> > > >> PMC members.
>> > > >> To be clear: this telemetry data should help us, as Airflow PMC, to
>> > > steer
>> > > >> some of the decision making based on this data similar to how only
>> PMC
>> > > has
>> > > >> a binding vote on the releases. [1] and this is similar to how
>> Apache
>> > > >> Superset does it too.
>> > > >> [1]
>> > > >> https://www.apache.org/dev/pmc.html#what-is-a-pmc
>> > > >
>> > > >
>> > > > On Wed, 3 Apr 2024 at 12:03, Pankaj Koti <pankaj.koti@astronomer.io
>> > > .invalid>
>> > > > wrote:
>> > > >
>> > > >> +1 to introduce this.
>> > > >>
>> > > >> I believe this would definitely help us take early and informed
>> > > decisions.
>> > > >> E.g. Had we had this earlier, I believe it would have definitely
>> > helped
>> > > us
>> > > >> more for our past discussions like whether we should continue
>> > supporting
>> > > >> MsSQL(
>> > https://lists.apache.org/thread/r06j306hldg03g2my1pd4nyjxg78b3h4
>> > > ),
>> > > >> similarly about the DaskExecutor (
>> > > >> https://lists.apache.org/thread/ptwjf5g87lyl5476krt91bzfrm96pnb1),
>> > etc.
>> > > >>
>> > > >>
>> > > >> Best regards,
>> > > >>
>> > > >> *Pankaj Koti*
>> > > >> Senior Software Engineer (Airflow OSS Engineering team)
>> > > >> Location: Pune, Maharashtra, India
>> > > >> Timezone: Indian Standard Time (IST)
>> > > >> Phone: +91 9730079985 <+91%2097300%2079985>
>> > > >>
>> > > >>
>> > > >> On Wed, Apr 3, 2024 at 2:44 PM Kaxil Naik <ka...@gmail.com>
>> > wrote:
>> > > >>
>> > > >> > Yup, I had added a link to scarf docs in the original email that
>> > > >> referenced
>> > > >> > opting out and we should even add an Airflow config that puts all
>> > > >> config in
>> > > >> > a single place. Without it we can’t be compliant to all the
>> policies
>> > > >> even
>> > > >> > if we collectively ignore or are unaware of the importance of it.
>> > > >> >
>> > > >> > Regarding the data, like I had mentioned in the email and I am
>> glad
>> > > >> others
>> > > >> > including you are on the same page that the data will be shared
>> with
>> > > all
>> > > >> > PMC members. The point about sharing it via website and
>> newsletter
>> > was
>> > > >> for
>> > > >> > the community — Airflow users. I don’t think anyone in the
>> community
>> > > >> (apart
>> > > >> > from the PMC members) would need raw data. And even if they need
>> it,
>> > > I’d
>> > > >> > say they should put effort and contribute to the Airflow project
>> and
>> > > >> become
>> > > >> > PMC members.
>> > > >> >
>> > > >> > To be clear: this telemetry data should help us, as Airflow PMC,
>> to
>> > > >> steer
>> > > >> > some of the decision making based on this data similar to how
>> only
>> > PMC
>> > > >> has
>> > > >> > a binding vote on the releases. [1] and this is similar to how
>> > Apache
>> > > >> > Superset does it too.
>> > > >> >
>> > > >> > [1]
>> > > >> > https://www.apache.org/dev/pmc.html#what-is-a-pmc
>> > > >> >
>> > > >> >
>> > > >> >
>> > > >> > On Wed, 3 Apr 2024 at 00:05, Hussein Awala <hu...@awala.fr>
>> > wrote:
>> > > >> >
>> > > >> > > I mentioned opting out just to confirm its importance, and
>> after
>> > > >> checking
>> > > >> > > the Scarf documentation it appears to be supported natively by
>> > > Scarf.
>> > > >> For
>> > > >> > > data accessibility, my point was more about raw data, not just
>> > > >> aggregated
>> > > >> > > information/insights shared via monthly newsletters, as we do
>> for
>> > > >> Airflow
>> > > >> > > annual Survey for example:
>> > > >> > > https://airflow.apache.org/survey vs
>> > > >> > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> https://docs.google.com/forms/d/1wYm6c5Gn379zkg7zD7vcWB-1fCjnOocT0oZm-tjft_Q/viewanalytics
>> > > >> > > .
>> > > >> > >
>> > > >> > > On Tue, Apr 2, 2024 at 2:43 PM Kaxil Naik <kaxilnaik@gmail.com
>> >
>> > > >> wrote:
>> > > >> > >
>> > > >> > > > Agreed to both your points Hussein but both the points are
>> > already
>> > > >> > > covered
>> > > >> > > > in my original discussion post - both about opting out and
>> > > providing
>> > > >> > data
>> > > >> > > > to all the PMC members and providing visibility via Monthly
>> > > >> > newsletters.
>> > > >> > > Is
>> > > >> > > > there anything else you propose to discuss that isn’t
>> covered?
>> > > >> > > >
>> > > >> > > >
>> > > >> > > >
>> > > >> > > > On Mon, 1 Apr 2024 at 13:21, Hussein Awala <hussein@awala.fr
>> >
>> > > >> wrote:
>> > > >> > > >
>> > > >> > > > > +1 for the idea in general, but there are two main points
>> to
>> > > >> discuss
>> > > >> > > > before
>> > > >> > > > > voting on this:
>> > > >> > > > >
>> > > >> > > > > 1. We should provide an option to disable Scarf:
>> > > >> > > > > As Airflow is not a paid product, we cannot force
>> companies to
>> > > >> report
>> > > >> > > > their
>> > > >> > > > > use of this project. Otherwise, some may choose to create
>> > their
>> > > >> own
>> > > >> > > fork
>> > > >> > > > > just to disable Scarf.
>> > > >> > > > >
>> > > >> > > > > 2. Concerning the exclusivity of access to data:
>> > > >> > > > > The data collected must either be completely proprietary
>> for
>> > use
>> > > >> by
>> > > >> > PMC
>> > > >> > > > and
>> > > >> > > > > ASF, or completely open. Since many companies offer Airflow
>> > as a
>> > > >> > > product,
>> > > >> > > > > it is imperative not to give one company more privileges
>> than
>> > > >> > others. I
>> > > >> > > > > raise this point for the principle of equality of
>> opportunity.
>> > > >> > > > >
>> > > >> > > > > On Mon, Apr 1, 2024 at 12:35 PM Ankit Chaurasia <
>> > > >> sunank200@gmail.com
>> > > >> > >
>> > > >> > > > > wrote:
>> > > >> > > > >
>> > > >> > > > > > Big +1 for Scarf.
>> > > >> > > > > >
>> > > >> > > > > > Transparency is key, so it's important to be super clear
>> > about
>> > > >> > opting
>> > > >> > > > > > out and what's tracked to avoid spooking anyone about IP
>> > > stuff.
>> > > >> > > > > >
>> > > >> > > > > > Regards
>> > > >> > > > > > Ankit Chaurasia
>> > > >> > > > > >
>> > > >> > > > > >
>> > > >> > > > > >
>> > > >> > > > > >
>> > > >> > > > > > On Mon, Apr 1, 2024 at 10:18 AM Amogh Desai <
>> > > >> > > amoghdesai.oss@gmail.com>
>> > > >> > > > > > wrote:
>> > > >> > > > > > >
>> > > >> > > > > > > +1 looks like a good tool which could be super helpful.
>> > > >> > > > > > >
>> > > >> > > > > > > * We should have some transparency into the data that
>> is
>> > > >> > collected
>> > > >> > > or
>> > > >> > > > > > sent
>> > > >> > > > > > > * We should have an option to optionally opt-out
>> > > >> > > > > > >
>> > > >> > > > > > > Thanks & Regards,
>> > > >> > > > > > > Amogh Desai
>> > > >> > > > > > >
>> > > >> > > > > > >
>> > > >> > > > > > > On Sun, Mar 31, 2024 at 7:53 AM Wei Lee <
>> > > weilee.rx@gmail.com>
>> > > >> > > wrote:
>> > > >> > > > > > >
>> > > >> > > > > > > > +1 to this. It would be really useful. As long as we
>> can
>> > > opt
>> > > >> > > out, I
>> > > >> > > > > > think
>> > > >> > > > > > > > we’re good.
>> > > >> > > > > > > >
>> > > >> > > > > > > > Best,
>> > > >> > > > > > > > Wei
>> > > >> > > > > > > >
>> > > >> > > > > > > > > On Mar 31, 2024, at 12:47 AM, Kaxil Naik <
>> > > >> > kaxilnaik@gmail.com>
>> > > >> > > > > > wrote:
>> > > >> > > > > > > > >
>> > > >> > > > > > > > > Grammar Correction:
>> > > >> > > > > > > > >
>> > > >> > > > > > > > > We should assume that those who deploy and upgrade
>> > > >> Airflow -
>> > > >> > > > > actually
>> > > >> > > > > > > > read
>> > > >> > > > > > > > >> and take into account what is written in the
>> release
>> > > >> notes -
>> > > >> > > > > > especially
>> > > >> > > > > > > > if
>> > > >> > > > > > > > >> they have security guys breathing their necks,
>> > > similarly
>> > > >> as
>> > > >> > we
>> > > >> > > > > have
>> > > >> > > > > > to
>> > > >> > > > > > > > >> assume they follow CVE announcements about
>> security
>> > > >> issues
>> > > >> > > > fixed.
>> > > >> > > > > > If we
>> > > >> > > > > > > > >> are very straightforward and out-going about the
>> > > change,
>> > > >> > > inform
>> > > >> > > > > very
>> > > >> > > > > > > > >> clearly how to opt-out, I don't see a big problem
>> > with
>> > > >> > > opt-out.
>> > > >> > > > > > > > >
>> > > >> > > > > > > > >
>> > > >> > > > > > > > > I couldn't agree more; even though we shouldn't
>> > collect
>> > > >> any
>> > > >> > > data
>> > > >> > > > > that
>> > > >> > > > > > > > > hamper security (and we should aim to do the same),
>> > most
>> > > >> > > security
>> > > >> > > > > > > > concerned
>> > > >> > > > > > > > > folks don't just upgrade, and we can rely on them
>> > > >> regarding
>> > > >> > > > release
>> > > >> > > > > > notes
>> > > >> > > > > > > > > or announcements and we can make it very clear in
>> our
>> > > >> > > > announcements
>> > > >> > > > > > too;
>> > > >> > > > > > > > > and in our installation guides.
>> > > >> > > > > > > > >
>> > > >> > > > > > > > > On Sat, 30 Mar 2024 at 16:47, Kaxil Naik <
>> > > >> > kaxilnaik@gmail.com>
>> > > >> > > > > > wrote:
>> > > >> > > > > > > > >
>> > > >> > > > > > > > >> Grammar crrection:
>> > > >> > > > > > > > >>
>> > > >> > > > > > > > >>
>> > > >> > > > > > > > >> On Sat, 30 Mar 2024 at 16:43, Kaxil Naik <
>> > > >> > kaxilnaik@gmail.com
>> > > >> > > >
>> > > >> > > > > > wrote:
>> > > >> > > > > > > > >>
>> > > >> > > > > > > > >>> Have this at the end of the email too: but if
>> folks
>> > > >> don't
>> > > >> > > read
>> > > >> > > > > > until
>> > > >> > > > > > > > the
>> > > >> > > > > > > > >>> end and quoting Maxime from the use-case blog[1]:
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>> "I think people often ask ‘how do I contribute to
>> > open
>> > > >> > > > source?’,
>> > > >> > > > > > ‘I've
>> > > >> > > > > > > > >>> got to get into the code’, or ‘ I’ve got to be an
>> > > >> > engineer.’
>> > > >> > > > > > Actually,
>> > > >> > > > > > > > the
>> > > >> > > > > > > > >>> very simplest thing that you can do is just say,
>> ‘my
>> > > >> > > > organization
>> > > >> > > > > > gets
>> > > >> > > > > > > > real
>> > > >> > > > > > > > >>> value from this piece of software.’ There are a
>> > bunch
>> > > of
>> > > >> > ways
>> > > >> > > > to
>> > > >> > > > > > let
>> > > >> > > > > > > > the
>> > > >> > > > > > > > >>> people know about it – and now Scarf is there. If
>> > your
>> > > >> > > > > > organization is
>> > > >> > > > > > > > >>> getting a lot of value from a piece of open
>> source
>> > > >> > software,
>> > > >> > > > make
>> > > >> > > > > > sure
>> > > >> > > > > > > > the
>> > > >> > > > > > > > >>> devs know about it."
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>> What kind of edge cases are you thinking about? I
>> > > don't
>> > > >> > think
>> > > >> > > > it
>> > > >> > > > > > makes
>> > > >> > > > > > > > >>> sense to have "opt-in" at all. As the goal is to
>> > > collect
>> > > >> > data
>> > > >> > > > for
>> > > >> > > > > > most
>> > > >> > > > > > > > >>> Airflow installations except for those that don't
>> > want
>> > > >> to
>> > > >> > > give
>> > > >> > > > > > data,
>> > > >> > > > > > > > then
>> > > >> > > > > > > > >>> "opt-out" is the only way to maximize it. As
>> long as
>> > > we
>> > > >> > don't
>> > > >> > > > > > collect
>> > > >> > > > > > > > any
>> > > >> > > > > > > > >>> PII data, this is in-compliance as well.
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>> Imagine someone learning Airflow, if they have to
>> > > opt-in
>> > > >> > via
>> > > >> > > a
>> > > >> > > > > > config,
>> > > >> > > > > > > > >>> they wouldn't even know or care about it, hence
>> us
>> > > >> losing
>> > > >> > > most
>> > > >> > > > of
>> > > >> > > > > > the
>> > > >> > > > > > > > data.
>> > > >> > > > > > > > >>> I understand why some orgs & individuals may
>> want to
>> > > >> > opt-out.
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>> Scarf Provides tracking pixels (essentially an
>> HTML
>> > > >> image
>> > > >> > > tag)
>> > > >> > > > > > that you
>> > > >> > > > > > > > >>> can place in your website or product to track
>> > visitors
>> > > >> to
>> > > >> > > that
>> > > >> > > > > > URL. If
>> > > >> > > > > > > > >>> there were any concerns about Privacy, ASF
>> wouldn't
>> > > have
>> > > >> > > > approved
>> > > >> > > > > > it
>> > > >> > > > > > > > at all.
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>> A few key details to note about the pixel:
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>>   - No PII is tracked… Scarf does not
>> capture/retain
>> > > IP
>> > > >> > > > > > information…
>> > > >> > > > > > > > >>>   this information is discarded by the platform
>> upon
>> > > >> > > > > > > > processing/aggregating
>> > > >> > > > > > > > >>>   - Scarf pixels respect the Do Not Track (DNT)
>> > > >> settings of
>> > > >> > > > > > browsers -
>> > > >> > > > > > > > >>>   these users will not be tracked whatsoever.
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>> All the ASF projects I had listed (whether they
>> use
>> > > >> Scarf
>> > > >> > > > gateway
>> > > >> > > > > > or
>> > > >> > > > > > > > >>> Scarf pixel in product) are using opt-out.
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>> 1. Short opt-in period before opt-out. Test this
>> > > feature
>> > > >> > with
>> > > >> > > > > > users who
>> > > >> > > > > > > > >>>> trust and if it works great - make it public. I
>> > think
>> > > >> it's
>> > > >> > > > wise
>> > > >> > > > > to
>> > > >> > > > > > > > handle
>> > > >> > > > > > > > >>>> edge cases and configure collected data more
>> > > >> accurately.
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>> It would be a pixel in the webserver, should
>> affect
>> > > >> nothing
>> > > >> > > at
>> > > >> > > > > all
>> > > >> > > > > > even
>> > > >> > > > > > > > >>> in an air-gapped environment.
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>>> 2. It should not affect anything if access to
>> the
>> > > >> internet
>> > > >> > > is
>> > > >> > > > > > > > restricted
>> > > >> > > > > > > > >>>> which is default for many companies.
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>> 100% agreed on the below:
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>>> I think we have a very good blueprint to follow
>> > > >> including
>> > > >> > at
>> > > >> > > > > > least 5
>> > > >> > > > > > > > >>>> other
>> > > >> > > > > > > > >>>> ASF projects that also passed the review of the
>> > > >> > privacy@asf.
>> > > >> > > > > And
>> > > >> > > > > > > > while I
>> > > >> > > > > > > > >>>> understand (and concur) the urge for opt-in by
>> > > default
>> > > >> > > coming
>> > > >> > > > > from
>> > > >> > > > > > > > >>>> consumer
>> > > >> > > > > > > > >>>> market (where it makes perfect sense) Airflow is
>> > not
>> > > a
>> > > >> > > > consumer
>> > > >> > > > > > > > >>>> software and is used in "corporate environment"
>> > which
>> > > >> has
>> > > >> > a
>> > > >> > > > > little
>> > > >> > > > > > > > >>>> different expectations and broad assumption that
>> > the
>> > > >> > company
>> > > >> > > > can
>> > > >> > > > > > make
>> > > >> > > > > > > > >>>> decisions on such telemetry on behalf of the
>> > > employees
>> > > >> > using
>> > > >> > > > it.
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>> Couldn't agree more; even though there shouldn't
>> we
>> > > >> collect
>> > > >> > > > > hamper
>> > > >> > > > > > > > >>> security (and we should aim to do the same), most
>> > > >> security
>> > > >> > > > > > concerned
>> > > >> > > > > > > > folks
>> > > >> > > > > > > > >>> don't just
>> > > >> > > > > > > > >>> upgrade, and we can rely on them regarding
>> release
>> > > >> notes or
>> > > >> > > > > > > > announcements
>> > > >> > > > > > > > >>> and we can make it very clear in our
>> announcements
>> > > too;
>> > > >> and
>> > > >> > > in
>> > > >> > > > > our
>> > > >> > > > > > > > >>> installation guides.
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>> We should assume that those who deploy and
>> upgrade
>> > > >> Airflow
>> > > >> > -
>> > > >> > > > > > actually
>> > > >> > > > > > > > read
>> > > >> > > > > > > > >>>> and take into account what is written in the
>> > release
>> > > >> > notes -
>> > > >> > > > > > > > especially
>> > > >> > > > > > > > >>>> if
>> > > >> > > > > > > > >>>> they have security guys breathing their necks,
>> > > >> similarly
>> > > >> > as
>> > > >> > > we
>> > > >> > > > > > have to
>> > > >> > > > > > > > >>>> assume they follow CVE announcements about
>> security
>> > > >> issues
>> > > >> > > > > fixed.
>> > > >> > > > > > If
>> > > >> > > > > > > > we
>> > > >> > > > > > > > >>>> are very straightforward and out-going about the
>> > > >> change,
>> > > >> > > > inform
>> > > >> > > > > > very
>> > > >> > > > > > > > >>>> clearly how to opt-out, I don't see a big
>> problem
>> > > with
>> > > >> > > > opt-out.
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>> To be clear, the collection of data, or at least
>> the
>> > > >> data
>> > > >> > we
>> > > >> > > > > should
>> > > >> > > > > > > > >>> gather here should help all the consumers without
>> > > >> violating
>> > > >> > > > > > anything
>> > > >> > > > > > > > >>> regulations. I will quote Maxime's quote in the
>> > > use-case
>> > > >> > doc
>> > > >> > > > [1]
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>> "*Another Form of Contributing*
>> > > >> > > > > > > > >>> “I think people often ask ‘how do I contribute to
>> > open
>> > > >> > > > source?’,
>> > > >> > > > > > ‘I've
>> > > >> > > > > > > > >>> got to get into the code’, or ‘ I’ve got to be an
>> > > >> > engineer.’
>> > > >> > > > > > Actually,
>> > > >> > > > > > > > the
>> > > >> > > > > > > > >>> very simplest thing that you can do is just say,
>> ‘my
>> > > >> > > > organization
>> > > >> > > > > > gets
>> > > >> > > > > > > > real
>> > > >> > > > > > > > >>> value from this piece of software.’ There are a
>> > bunch
>> > > of
>> > > >> > ways
>> > > >> > > > to
>> > > >> > > > > > let
>> > > >> > > > > > > > the
>> > > >> > > > > > > > >>> people know about it – and now Scarf is there. If
>> > your
>> > > >> > > > > > organization is
>> > > >> > > > > > > > >>> getting a lot of value from a piece of open
>> source
>> > > >> > software,
>> > > >> > > > make
>> > > >> > > > > > sure
>> > > >> > > > > > > > the
>> > > >> > > > > > > > >>> devs know about it.”"
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>> [1]
>> > > >> > > > https://about.scarf.sh/post/scarf-case-study-apache-superset
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>> On Sat, 30 Mar 2024 at 14:02, Alexander Shorin <
>> > > >> > > > > kxepal@apache.org>
>> > > >> > > > > > > > wrote:
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > > >>>> Hi Jarek!
>> > > >> > > > > > > > >>>>
>> > > >> > > > > > > > >>>> I understand the reasons for opt-out from a
>> project
>> > > >> view.
>> > > >> > I
>> > > >> > > > just
>> > > >> > > > > > > > suddenly
>> > > >> > > > > > > > >>>> imagined the situation when an upgrade happens
>> and
>> > > here
>> > > >> > > comes
>> > > >> > > > > the
>> > > >> > > > > > > > data to
>> > > >> > > > > > > > >>>> some third party service - that's a view from a
>> > user
>> > > >> side
>> > > >> > of
>> > > >> > > > > some
>> > > >> > > > > > big
>> > > >> > > > > > > > >>>> company.
>> > > >> > > > > > > > >>>>
>> > > >> > > > > > > > >>>> There could be good alternatives to handle this:
>> > > >> > > > > > > > >>>> 1. Short opt-in period before opt-out. Test this
>> > > >> feature
>> > > >> > > with
>> > > >> > > > > > users
>> > > >> > > > > > > > who
>> > > >> > > > > > > > >>>> trust and if it works great - make it public. I
>> > think
>> > > >> it's
>> > > >> > > > wise
>> > > >> > > > > to
>> > > >> > > > > > > > handle
>> > > >> > > > > > > > >>>> edge cases and configure collected data more
>> > > >> accurately.
>> > > >> > > > > > > > >>>> 2. Explicitly somehow warn about this feature to
>> > make
>> > > >> this
>> > > >> > > > > > feature not
>> > > >> > > > > > > > >>>> get
>> > > >> > > > > > > > >>>> unnoticed. Just to reduce possible frustration.
>> > > >> > > > > > > > >>>>
>> > > >> > > > > > > > >>>> Just a personal thoughts for discussion (:
>> > > >> > > > > > > > >>>>
>> > > >> > > > > > > > >>>> --
>> > > >> > > > > > > > >>>> ,,,^..^,,,
>> > > >> > > > > > > > >>>>
>> > > >> > > > > > > > >>>> On Sat, Mar 30, 2024 at 4:36 PM Jarek Potiuk <
>> > > >> > > > jarek@potiuk.com>
>> > > >> > > > > > > > wrote:
>> > > >> > > > > > > > >>>>
>> > > >> > > > > > > > >>>>> Hello everyone,
>> > > >> > > > > > > > >>>>>
>> > > >> > > > > > > > >>>>> it has to be:
>> > > >> > > > > > > > >>>>>
>> > > >> > > > > > > > >>>>> 1. Opt-in by default to not trigger security
>> guys
>> > > >> about
>> > > >> > new
>> > > >> > > > > > unplanned
>> > > >> > > > > > > > >>>>>> activity after regular upgrade.
>> > > >> > > > > > > > >>>>>>
>> > > >> > > > > > > > >>>>>
>> > > >> > > > > > > > >>>>> That's a very good point about security
>> triggering
>> > > >> > > Alexander,
>> > > >> > > > > > but I
>> > > >> > > > > > > > am
>> > > >> > > > > > > > >>>> not
>> > > >> > > > > > > > >>>>> so sure it means that we "have to" do opt-in.
>> > There
>> > > >> are
>> > > >> > > other
>> > > >> > > > > > ways of
>> > > >> > > > > > > > >>>>> communicating with the "deployment managers"
>> who
>> > > >> install
>> > > >> > > and
>> > > >> > > > > > upgrade
>> > > >> > > > > > > > >>>>> airflow - i.e. release notes. blogs, social
>> media
>> > of
>> > > >> > ours,
>> > > >> > > > > slack
>> > > >> > > > > > > > >>>>> announcements etc. We have plenty of channels
>> we
>> > can
>> > > >> use
>> > > >> > to
>> > > >> > > > > > > > >>>> communicate the
>> > > >> > > > > > > > >>>>> change.
>> > > >> > > > > > > > >>>>>
>> > > >> > > > > > > > >>>>> I think we have a very good blueprint to follow
>> > > >> including
>> > > >> > > at
>> > > >> > > > > > least 5
>> > > >> > > > > > > > >>>> other
>> > > >> > > > > > > > >>>>> ASF projects that also passed the review of the
>> > > >> > > privacy@asf.
>> > > >> > > > > And
>> > > >> > > > > > > > >>>> while I
>> > > >> > > > > > > > >>>>> understand (and concur) the urge for opt-in by
>> > > default
>> > > >> > > coming
>> > > >> > > > > > from
>> > > >> > > > > > > > >>>> consumer
>> > > >> > > > > > > > >>>>> market (where it makes perfect sense) Airflow
>> is
>> > > not a
>> > > >> > > > consumer
>> > > >> > > > > > > > >>>>> software and is used in "corporate environment"
>> > > which
>> > > >> > has a
>> > > >> > > > > > little
>> > > >> > > > > > > > >>>>> different expectations and broad assumption
>> that
>> > the
>> > > >> > > company
>> > > >> > > > > can
>> > > >> > > > > > make
>> > > >> > > > > > > > >>>>> decisions on such telemetry on behalf of the
>> > > employees
>> > > >> > > using
>> > > >> > > > > it.
>> > > >> > > > > > > > >>>>>
>> > > >> > > > > > > > >>>>> We should assume that those who deploy and
>> upgrade
>> > > >> > Airflow
>> > > >> > > -
>> > > >> > > > > > actually
>> > > >> > > > > > > > >>>> read
>> > > >> > > > > > > > >>>>> and take into account what is written in the
>> > release
>> > > >> > notes
>> > > >> > > -
>> > > >> > > > > > > > >>>> especially if
>> > > >> > > > > > > > >>>>> they have security guys breathing their necks,
>> > > >> similarly
>> > > >> > as
>> > > >> > > > we
>> > > >> > > > > > have
>> > > >> > > > > > > > to
>> > > >> > > > > > > > >>>>> assume they follow CVE announcements about
>> > security
>> > > >> > issues
>> > > >> > > > > > fixed. If
>> > > >> > > > > > > > we
>> > > >> > > > > > > > >>>>> are very straightforward and out-going about
>> the
>> > > >> change,
>> > > >> > > > inform
>> > > >> > > > > > very
>> > > >> > > > > > > > >>>>> clearly how to opt-out, I don't see a big
>> problem
>> > > with
>> > > >> > > > opt-out.
>> > > >> > > > > > > > >>>>>
>> > > >> > > > > > > > >>>>> We should of course check with privacy@a.o
>> (but
>> > I'v
>> > > >> > spend
>> > > >> > > a
>> > > >> > > > > good
>> > > >> > > > > > > > deal
>> > > >> > > > > > > > >>>> of
>> > > >> > > > > > > > >>>>> time reading the Superset  and other use case
>> and
>> > > >> > > explanation
>> > > >> > > > > in
>> > > >> > > > > > > > >>>> detail to
>> > > >> > > > > > > > >>>>> make a better informed decision) - and it looks
>> > like
>> > > >> they
>> > > >> > > > also
>> > > >> > > > > > went
>> > > >> > > > > > > > >>>> opt-out
>> > > >> > > > > > > > >>>>> way and got cleared by privacy@a.o.  And if we
>> > > cannot
>> > > >> > > reach
>> > > >> > > > > > > > >>>> consensus, we
>> > > >> > > > > > > > >>>>> should - as usual - make a voting decision on
>> it
>> > > >> (because
>> > > >> > > > yes,
>> > > >> > > > > > it is
>> > > >> > > > > > > > an
>> > > >> > > > > > > > >>>>> important decision), but - after reading and
>> > > >> > understanding
>> > > >> > > > why
>> > > >> > > > > > others
>> > > >> > > > > > > > >>>> also
>> > > >> > > > > > > > >>>>> did it - for me personally, opt-out is a good
>> > path.
>> > > >> > > > > > > > >>>>>
>> > > >> > > > > > > > >>>>> Also because it will rather increase the
>> amount of
>> > > >> data
>> > > >> > to
>> > > >> > > > > > gather,
>> > > >> > > > > > > > and
>> > > >> > > > > > > > >>>> in
>> > > >> > > > > > > > >>>>> our case - counter intuitively - it will be
>> even
>> > > >> better
>> > > >> > for
>> > > >> > > > > > privacy
>> > > >> > > > > > > > and
>> > > >> > > > > > > > >>>>> corporate anonymity, because the more data we
>> get,
>> > > the
>> > > >> > more
>> > > >> > > > > > difficult
>> > > >> > > > > > > > >>>> it
>> > > >> > > > > > > > >>>>> will be to get any
>> non-statistical/non-aggregated
>> > > >> insight
>> > > >> > > > from
>> > > >> > > > > > it.
>> > > >> > > > > > > > >>>> Imagine
>> > > >> > > > > > > > >>>>> if only a few corporate users will enable it
>> > > >> consciously
>> > > >> > -
>> > > >> > > > then
>> > > >> > > > > > we
>> > > >> > > > > > > > >>>> will be
>> > > >> > > > > > > > >>>>> able to draw much more conclusions if we find
>> out
>> > > who
>> > > >> > they
>> > > >> > > > are,
>> > > >> > > > > > than
>> > > >> > > > > > > > if
>> > > >> > > > > > > > >>>>> everyone has it enabled by default.
>> > > >> > > > > > > > >>>>>
>> > > >> > > > > > > > >>>>> That's my take on it - but again, it's up to
>> us to
>> > > >> vote,
>> > > >> > > for
>> > > >> > > > me
>> > > >> > > > > > > > opt-in
>> > > >> > > > > > > > >>>> is
>> > > >> > > > > > > > >>>>> not "has to", and I am rather for opt-out.
>> > > >> > > > > > > > >>>>>
>> > > >> > > > > > > > >>>>> J.
>> > > >> > > > > > > > >>>>>
>> > > >> > > > > > > > >>>>>> Hi all,
>> > > >> > > > > > > > >>>>>>
>> > > >> > > > > > > > >>>>>>
>> > > >> > > > > > > > >>>>>>> I want to propose gathering telemetry for
>> > Airflow
>> > > >> > > > > > installations.
>> > > >> > > > > > > > >>>> As the
>> > > >> > > > > > > > >>>>>>> Airflow community, we have been relying
>> heavily
>> > on
>> > > >> the
>> > > >> > > > yearly
>> > > >> > > > > > > > >>>> Airflow
>> > > >> > > > > > > > >>>>>>> Survey and anecdotes to answer a few key
>> > questions
>> > > >> > about
>> > > >> > > > > > Airflow
>> > > >> > > > > > > > >>>> usage.
>> > > >> > > > > > > > >>>>>>> Questions like the following:
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>>   - Which versions of Airflow are people
>> > > >> > installing/using
>> > > >> > > > now
>> > > >> > > > > > > > >>>> (i.e.
>> > > >> > > > > > > > >>>>>>>   whether people have primarily made the jump
>> > from
>> > > >> > > version
>> > > >> > > > X
>> > > >> > > > > to
>> > > >> > > > > > > > >>>>> version
>> > > >> > > > > > > > >>>>>> Y)
>> > > >> > > > > > > > >>>>>>>   - Which DB is used as the Metadata DB and
>> > which
>> > > >> > version
>> > > >> > > > e.g
>> > > >> > > > > > Pg
>> > > >> > > > > > > > >>>> 14?
>> > > >> > > > > > > > >>>>>>>   - What Python version is being used?
>> > > >> > > > > > > > >>>>>>>   - Which Executor is being used?
>> > > >> > > > > > > > >>>>>>>   - Approximately how many people out there
>> in
>> > the
>> > > >> > world
>> > > >> > > > are
>> > > >> > > > > > > > >>>>> installing
>> > > >> > > > > > > > >>>>>>>   Airflow
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>> There is a solution that should help answer
>> > these
>> > > >> > > > questions:
>> > > >> > > > > > Scarf
>> > > >> > > > > > > > >>>> [1].
>> > > >> > > > > > > > >>>>>> The
>> > > >> > > > > > > > >>>>>>> ASF already approves Scarf [2][3] and is
>> already
>> > > >> used
>> > > >> > by
>> > > >> > > > > other
>> > > >> > > > > > ASF
>> > > >> > > > > > > > >>>>>>> projects: Superset [4], Dolphin Scheduler
>> [5],
>> > > Dubbo
>> > > >> > > > > > Kubernetes,
>> > > >> > > > > > > > >>>>> DevLake,
>> > > >> > > > > > > > >>>>>>> Skywalking as it follows GDPR and other
>> > > regulations.
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>> Similar to Superset, we probably can use it
>> as
>> > > >> follows:
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>>   1. Install the `scarf js` npm package and
>> > bundle
>> > > >> it
>> > > >> > in
>> > > >> > > > the
>> > > >> > > > > > > > >>>>> Webserver.
>> > > >> > > > > > > > >>>>>>>   When the package is downloaded & Airflow
>> > > >> webserver is
>> > > >> > > > > opened,
>> > > >> > > > > > > > >>>>> metadata
>> > > >> > > > > > > > >>>>>>> is
>> > > >> > > > > > > > >>>>>>>   recorded to the Scarf dashboard.
>> > > >> > > > > > > > >>>>>>>   2. Utilize the Scarf Gateway [6], which we
>> can
>> > > >> use in
>> > > >> > > > front
>> > > >> > > > > > of
>> > > >> > > > > > > > >>>>> docker
>> > > >> > > > > > > > >>>>>>>   containers. While it’s possible people go
>> > around
>> > > >> this
>> > > >> > > > > > gateway,
>> > > >> > > > > > > > >>>> we
>> > > >> > > > > > > > >>>>> can
>> > > >> > > > > > > > >>>>>>>   probably configure and encourage most
>> traffic
>> > to
>> > > >> go
>> > > >> > > > through
>> > > >> > > > > > > > >>>> these
>> > > >> > > > > > > > >>>>>>> gateways.
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>> While Scarf does not store any personally
>> > > >> identifying
>> > > >> > > > > > information
>> > > >> > > > > > > > >>>> from
>> > > >> > > > > > > > >>>>>> SDK
>> > > >> > > > > > > > >>>>>>> telemetry data, it does send various bits of
>> > > >> IP-derived
>> > > >> > > > > > > > >>>> information as
>> > > >> > > > > > > > >>>>>>> outlined here [7]. This data should be made
>> as
>> > > >> > > transparent
>> > > >> > > > as
>> > > >> > > > > > > > >>>> possible
>> > > >> > > > > > > > >>>>> by
>> > > >> > > > > > > > >>>>>>> granting dashboard access to the Airflow PMC
>> and
>> > > any
>> > > >> > > other
>> > > >> > > > > > relevant
>> > > >> > > > > > > > >>>>> means
>> > > >> > > > > > > > >>>>>>> of sharing/surfacing it that we encounter
>> (Town
>> > > >> Hall,
>> > > >> > > > Slack,
>> > > >> > > > > > > > >>>> Newsletter
>> > > >> > > > > > > > >>>>>>> etc).
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>> The following case studies are worth reading:
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>>   1.
>> > > >> > > > > >
>> > https://about.scarf.sh/post/scarf-case-study-apache-superset
>> > > >> > > > > > > > >>>>> (From
>> > > >> > > > > > > > >>>>>>>   Maxime)
>> > > >> > > > > > > > >>>>>>>   2.
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>
>> > > >> > > > > > > > >>>>>
>> > > >> > > > > > > > >>>>
>> > > >> > > > > > > >
>> > > >> > > > > >
>> > > >> > > > >
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> https://about.scarf.sh/post/haskell-org-bridging-the-gap-between-language-innovation-and-community-understanding
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>> Similar to them, this could help in various
>> ways
>> > > >> that
>> > > >> > > come
>> > > >> > > > > with
>> > > >> > > > > > > > >>>> using
>> > > >> > > > > > > > >>>>>> data
>> > > >> > > > > > > > >>>>>>> for decision-making. With clear guidelines on
>> > "how
>> > > >> to
>> > > >> > > > > opt-out"
>> > > >> > > > > > > > >>>>>> [8][9][10] &
>> > > >> > > > > > > > >>>>>>> "what data is being collected" on the Airflow
>> > > >> website,
>> > > >> > > this
>> > > >> > > > > > can be
>> > > >> > > > > > > > >>>>>>> beneficial to the entire community as we
>> would
>> > be
>> > > >> > making
>> > > >> > > > more
>> > > >> > > > > > > > >>>> informed
>> > > >> > > > > > > > >>>>>>> decisions.
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>> Regards,
>> > > >> > > > > > > > >>>>>>> Kaxil
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>> [1] https://about.scarf.sh/
>> > > >> > > > > > > > >>>>>>> [2]
>> > > >> > > > > >
>> > > https://privacy.apache.org/policies/privacy-policy-public.html
>> > > >> > > > > > > > >>>>>>> [3]
>> > > https://privacy.apache.org/faq/committers.html
>> > > >> > > > > > > > >>>>>>> [4]
>> > > https://github.com/apache/superset/issues/25639
>> > > >> > > > > > > > >>>>>>> [5]
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>
>> > > >> > > > > > > > >>>>>
>> > > >> > > > > > > > >>>>
>> > > >> > > > > > > >
>> > > >> > > > > >
>> > > >> > > > >
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> https://github.com/search?q=repo%3Aapache%2Fdolphinscheduler%20scarf.sh&type=code
>> > > >> > > > > > > > >>>>>>> [6] https://about.scarf.sh/scarf-gateway
>> > > >> > > > > > > > >>>>>>> [7] https://about.scarf.sh/privacy-policy
>> > > >> > > > > > > > >>>>>>> [8]
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>
>> > > >> > > > > > > > >>>>>
>> > > >> > > > > > > > >>>>
>> > > >> > > > > > > >
>> > > >> > > > > >
>> > > >> > > > >
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> https://superset.apache.org/docs/frequently-asked-questions/#does-superset-collect-any-telemetry-data
>> > > >> > > > > > > > >>>>>>> [9]
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>
>> > > >> > > > > > > > >>>>>
>> > > >> > > > > > > > >>>>
>> > > >> > > > > > > >
>> > > >> > > > > >
>> > > >> > > > >
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> https://superset.apache.org/docs/installation/installing-superset-using-docker-compose
>> > > >> > > > > > > > >>>>>>> [10]
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>
>> > > >> > > > > > > > >>>>>
>> > > >> > > > > > > > >>>>
>> > > >> > > > > > > >
>> > > >> > > > > >
>> > > >> > > > >
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>> https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics
>> > > >> > > > > > > > >>>>>>>
>> > > >> > > > > > > > >>>>>>
>> > > >> > > > > > > > >>>>>
>> > > >> > > > > > > > >>>>
>> > > >> > > > > > > > >>>
>> > > >> > > > > > > >
>> > > >> > > > > > > >
>> > > >> > > > > > > >
>> > > >> > > >
>> > > >>
>> ---------------------------------------------------------------------
>> > > >> > > > > > > > To unsubscribe, e-mail:
>> > > dev-unsubscribe@airflow.apache.org
>> > > >> > > > > > > > For additional commands, e-mail:
>> > > >> dev-help@airflow.apache.org
>> > > >> > > > > > > >
>> > > >> > > > > > > >
>> > > >> > > > > >
>> > > >> > > > > >
>> > > >> >
>> > ---------------------------------------------------------------------
>> > > >> > > > > > To unsubscribe, e-mail:
>> dev-unsubscribe@airflow.apache.org
>> > > >> > > > > > For additional commands, e-mail:
>> > dev-help@airflow.apache.org
>> > > >> > > > > >
>> > > >> > > > > >
>> > > >> > > > >
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] Proposal for adding Telemetry via Scarf

Posted by Michał Modras <mi...@google.com.INVALID>.
If it is packaged and installed by default, we add the dependency (and its
dependencies) to Airflow's already-not-small dependency tree. If we make it
installed and enabled by default, would there be an easy way to not just
switch it off (e.g. through the env variable), but also not package it at
all? That's why I was suggesting a provider, but actually any other
pluggable (and unpluggable) mechanism would work.

On Tue, Apr 9, 2024 at 2:41 AM Hussein Awala <hu...@awala.fr> wrote:

> > Other than that I don't mind it being e.g. optional provider.
>
> I don't think it is possible to implement it in a provider because it is a
> js package installed on the webserver; we could implement it as a plugin
> (Blueprint), but in this case, the user must make an effort to register it.
>
> It would be better to always install it, and activate it by default, with
> the possibility of deactivating it via the environment variable
> `SCARF_ANALYTICS=false` (according to the documentation), where if it is
> deactivated by default, many users will not activate it even if they don't
> mind to report the metrics, but if we enable it by default, only users who
> don't want to send metrics will disable it.
>
>
> On Fri, Apr 5, 2024 at 6:19 PM Michał Modras
> <mi...@google.com.invalid> wrote:
>
> > My 2 cents: it must be possible to opt-out, preferably it should be
> > possible to deploy Airflow instances without bundling the telemetry
> library
> > dependencies. Other than that I don't mind it being e.g. optional
> provider.
> >
> > śr., 3 kwi 2024, 22:42 użytkownik Hussein Awala <hu...@awala.fr>
> > napisał:
> >
> > > > I'd like to propose, that we start with collecting simple data with
> > > limited access: to all the PMC members. We can always expand it to
> > > Committers and then expand further to make it invite-only or setup
> > > exporting it to a DB like Postgres
> > > <https://github.com/scarf-sh/scarf-postgres-exporter> and have a
> > publicly
> > > viewable dashboard.
> > >
> > > Looks like a good plan; we can discuss the export format when we decide
> > to
> > > do it.
> > >
> > > On Wed, Apr 3, 2024 at 7:59 PM Kaxil Naik <ka...@gmail.com> wrote:
> > >
> > > > Yup, exactly.
> > > >
> > > > I believe this would definitely help us take early and informed
> > > decisions.
> > > >> E.g. Had we had this earlier, I believe it would have definitely
> > helped
> > > us
> > > >> more for our past discussions like whether we should continue
> > supporting
> > > >> MsSQL(
> > https://lists.apache.org/thread/r06j306hldg03g2my1pd4nyjxg78b3h4
> > > ),
> > > >> similarly about the DaskExecutor (
> > > >> https://lists.apache.org/thread/ptwjf5g87lyl5476krt91bzfrm96pnb1),
> > etc.
> > > >>
> > > >
> > > >
> > > > Btw clarifying my own stance on the below; and let me know what you
> > > think @Hussein
> > > > Awala <hu...@awala.fr> : I'd like to propose, that we start with
> > > > collecting simple data with limited access: to all the PMC members.
> We
> > > can
> > > > always expand it to Committers and then expand further to make it
> > > > invite-only or setup exporting it to a DB like Postgres
> > > > <https://github.com/scarf-sh/scarf-postgres-exporter> and have a
> > > publicly
> > > > viewable dashboard. It would be similar to an iterative software
> > > > development approach, since this will be the first time for us, as
> > > Airflow
> > > > PMC, to add such telemetry. This is of course just my opinion though
> :)
> > > >
> > > > Regarding the data, like I had mentioned in the email and I am glad
> > > others
> > > >> including you are on the same page that the data will be shared with
> > all
> > > >> PMC members. The point about sharing it via website and newsletter
> was
> > > for
> > > >> the community — Airflow users. I don’t think anyone in the community
> > > (apart
> > > >> from the PMC members) would need raw data. And even if they need it,
> > I’d
> > > >> say they should put effort and contribute to the Airflow project and
> > > become
> > > >> PMC members.
> > > >> To be clear: this telemetry data should help us, as Airflow PMC, to
> > > steer
> > > >> some of the decision making based on this data similar to how only
> PMC
> > > has
> > > >> a binding vote on the releases. [1] and this is similar to how
> Apache
> > > >> Superset does it too.
> > > >> [1]
> > > >> https://www.apache.org/dev/pmc.html#what-is-a-pmc
> > > >
> > > >
> > > > On Wed, 3 Apr 2024 at 12:03, Pankaj Koti <pankaj.koti@astronomer.io
> > > .invalid>
> > > > wrote:
> > > >
> > > >> +1 to introduce this.
> > > >>
> > > >> I believe this would definitely help us take early and informed
> > > decisions.
> > > >> E.g. Had we had this earlier, I believe it would have definitely
> > helped
> > > us
> > > >> more for our past discussions like whether we should continue
> > supporting
> > > >> MsSQL(
> > https://lists.apache.org/thread/r06j306hldg03g2my1pd4nyjxg78b3h4
> > > ),
> > > >> similarly about the DaskExecutor (
> > > >> https://lists.apache.org/thread/ptwjf5g87lyl5476krt91bzfrm96pnb1),
> > etc.
> > > >>
> > > >>
> > > >> Best regards,
> > > >>
> > > >> *Pankaj Koti*
> > > >> Senior Software Engineer (Airflow OSS Engineering team)
> > > >> Location: Pune, Maharashtra, India
> > > >> Timezone: Indian Standard Time (IST)
> > > >> Phone: +91 9730079985 <+91%2097300%2079985>
> > > >>
> > > >>
> > > >> On Wed, Apr 3, 2024 at 2:44 PM Kaxil Naik <ka...@gmail.com>
> > wrote:
> > > >>
> > > >> > Yup, I had added a link to scarf docs in the original email that
> > > >> referenced
> > > >> > opting out and we should even add an Airflow config that puts all
> > > >> config in
> > > >> > a single place. Without it we can’t be compliant to all the
> policies
> > > >> even
> > > >> > if we collectively ignore or are unaware of the importance of it.
> > > >> >
> > > >> > Regarding the data, like I had mentioned in the email and I am
> glad
> > > >> others
> > > >> > including you are on the same page that the data will be shared
> with
> > > all
> > > >> > PMC members. The point about sharing it via website and newsletter
> > was
> > > >> for
> > > >> > the community — Airflow users. I don’t think anyone in the
> community
> > > >> (apart
> > > >> > from the PMC members) would need raw data. And even if they need
> it,
> > > I’d
> > > >> > say they should put effort and contribute to the Airflow project
> and
> > > >> become
> > > >> > PMC members.
> > > >> >
> > > >> > To be clear: this telemetry data should help us, as Airflow PMC,
> to
> > > >> steer
> > > >> > some of the decision making based on this data similar to how only
> > PMC
> > > >> has
> > > >> > a binding vote on the releases. [1] and this is similar to how
> > Apache
> > > >> > Superset does it too.
> > > >> >
> > > >> > [1]
> > > >> > https://www.apache.org/dev/pmc.html#what-is-a-pmc
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Wed, 3 Apr 2024 at 00:05, Hussein Awala <hu...@awala.fr>
> > wrote:
> > > >> >
> > > >> > > I mentioned opting out just to confirm its importance, and after
> > > >> checking
> > > >> > > the Scarf documentation it appears to be supported natively by
> > > Scarf.
> > > >> For
> > > >> > > data accessibility, my point was more about raw data, not just
> > > >> aggregated
> > > >> > > information/insights shared via monthly newsletters, as we do
> for
> > > >> Airflow
> > > >> > > annual Survey for example:
> > > >> > > https://airflow.apache.org/survey vs
> > > >> > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://docs.google.com/forms/d/1wYm6c5Gn379zkg7zD7vcWB-1fCjnOocT0oZm-tjft_Q/viewanalytics
> > > >> > > .
> > > >> > >
> > > >> > > On Tue, Apr 2, 2024 at 2:43 PM Kaxil Naik <ka...@gmail.com>
> > > >> wrote:
> > > >> > >
> > > >> > > > Agreed to both your points Hussein but both the points are
> > already
> > > >> > > covered
> > > >> > > > in my original discussion post - both about opting out and
> > > providing
> > > >> > data
> > > >> > > > to all the PMC members and providing visibility via Monthly
> > > >> > newsletters.
> > > >> > > Is
> > > >> > > > there anything else you propose to discuss that isn’t covered?
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > On Mon, 1 Apr 2024 at 13:21, Hussein Awala <hu...@awala.fr>
> > > >> wrote:
> > > >> > > >
> > > >> > > > > +1 for the idea in general, but there are two main points to
> > > >> discuss
> > > >> > > > before
> > > >> > > > > voting on this:
> > > >> > > > >
> > > >> > > > > 1. We should provide an option to disable Scarf:
> > > >> > > > > As Airflow is not a paid product, we cannot force companies
> to
> > > >> report
> > > >> > > > their
> > > >> > > > > use of this project. Otherwise, some may choose to create
> > their
> > > >> own
> > > >> > > fork
> > > >> > > > > just to disable Scarf.
> > > >> > > > >
> > > >> > > > > 2. Concerning the exclusivity of access to data:
> > > >> > > > > The data collected must either be completely proprietary for
> > use
> > > >> by
> > > >> > PMC
> > > >> > > > and
> > > >> > > > > ASF, or completely open. Since many companies offer Airflow
> > as a
> > > >> > > product,
> > > >> > > > > it is imperative not to give one company more privileges
> than
> > > >> > others. I
> > > >> > > > > raise this point for the principle of equality of
> opportunity.
> > > >> > > > >
> > > >> > > > > On Mon, Apr 1, 2024 at 12:35 PM Ankit Chaurasia <
> > > >> sunank200@gmail.com
> > > >> > >
> > > >> > > > > wrote:
> > > >> > > > >
> > > >> > > > > > Big +1 for Scarf.
> > > >> > > > > >
> > > >> > > > > > Transparency is key, so it's important to be super clear
> > about
> > > >> > opting
> > > >> > > > > > out and what's tracked to avoid spooking anyone about IP
> > > stuff.
> > > >> > > > > >
> > > >> > > > > > Regards
> > > >> > > > > > Ankit Chaurasia
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > On Mon, Apr 1, 2024 at 10:18 AM Amogh Desai <
> > > >> > > amoghdesai.oss@gmail.com>
> > > >> > > > > > wrote:
> > > >> > > > > > >
> > > >> > > > > > > +1 looks like a good tool which could be super helpful.
> > > >> > > > > > >
> > > >> > > > > > > * We should have some transparency into the data that is
> > > >> > collected
> > > >> > > or
> > > >> > > > > > sent
> > > >> > > > > > > * We should have an option to optionally opt-out
> > > >> > > > > > >
> > > >> > > > > > > Thanks & Regards,
> > > >> > > > > > > Amogh Desai
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > > On Sun, Mar 31, 2024 at 7:53 AM Wei Lee <
> > > weilee.rx@gmail.com>
> > > >> > > wrote:
> > > >> > > > > > >
> > > >> > > > > > > > +1 to this. It would be really useful. As long as we
> can
> > > opt
> > > >> > > out, I
> > > >> > > > > > think
> > > >> > > > > > > > we’re good.
> > > >> > > > > > > >
> > > >> > > > > > > > Best,
> > > >> > > > > > > > Wei
> > > >> > > > > > > >
> > > >> > > > > > > > > On Mar 31, 2024, at 12:47 AM, Kaxil Naik <
> > > >> > kaxilnaik@gmail.com>
> > > >> > > > > > wrote:
> > > >> > > > > > > > >
> > > >> > > > > > > > > Grammar Correction:
> > > >> > > > > > > > >
> > > >> > > > > > > > > We should assume that those who deploy and upgrade
> > > >> Airflow -
> > > >> > > > > actually
> > > >> > > > > > > > read
> > > >> > > > > > > > >> and take into account what is written in the
> release
> > > >> notes -
> > > >> > > > > > especially
> > > >> > > > > > > > if
> > > >> > > > > > > > >> they have security guys breathing their necks,
> > > similarly
> > > >> as
> > > >> > we
> > > >> > > > > have
> > > >> > > > > > to
> > > >> > > > > > > > >> assume they follow CVE announcements about security
> > > >> issues
> > > >> > > > fixed.
> > > >> > > > > > If we
> > > >> > > > > > > > >> are very straightforward and out-going about the
> > > change,
> > > >> > > inform
> > > >> > > > > very
> > > >> > > > > > > > >> clearly how to opt-out, I don't see a big problem
> > with
> > > >> > > opt-out.
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > > I couldn't agree more; even though we shouldn't
> > collect
> > > >> any
> > > >> > > data
> > > >> > > > > that
> > > >> > > > > > > > > hamper security (and we should aim to do the same),
> > most
> > > >> > > security
> > > >> > > > > > > > concerned
> > > >> > > > > > > > > folks don't just upgrade, and we can rely on them
> > > >> regarding
> > > >> > > > release
> > > >> > > > > > notes
> > > >> > > > > > > > > or announcements and we can make it very clear in
> our
> > > >> > > > announcements
> > > >> > > > > > too;
> > > >> > > > > > > > > and in our installation guides.
> > > >> > > > > > > > >
> > > >> > > > > > > > > On Sat, 30 Mar 2024 at 16:47, Kaxil Naik <
> > > >> > kaxilnaik@gmail.com>
> > > >> > > > > > wrote:
> > > >> > > > > > > > >
> > > >> > > > > > > > >> Grammar crrection:
> > > >> > > > > > > > >>
> > > >> > > > > > > > >>
> > > >> > > > > > > > >> On Sat, 30 Mar 2024 at 16:43, Kaxil Naik <
> > > >> > kaxilnaik@gmail.com
> > > >> > > >
> > > >> > > > > > wrote:
> > > >> > > > > > > > >>
> > > >> > > > > > > > >>> Have this at the end of the email too: but if
> folks
> > > >> don't
> > > >> > > read
> > > >> > > > > > until
> > > >> > > > > > > > the
> > > >> > > > > > > > >>> end and quoting Maxime from the use-case blog[1]:
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>> "I think people often ask ‘how do I contribute to
> > open
> > > >> > > > source?’,
> > > >> > > > > > ‘I've
> > > >> > > > > > > > >>> got to get into the code’, or ‘ I’ve got to be an
> > > >> > engineer.’
> > > >> > > > > > Actually,
> > > >> > > > > > > > the
> > > >> > > > > > > > >>> very simplest thing that you can do is just say,
> ‘my
> > > >> > > > organization
> > > >> > > > > > gets
> > > >> > > > > > > > real
> > > >> > > > > > > > >>> value from this piece of software.’ There are a
> > bunch
> > > of
> > > >> > ways
> > > >> > > > to
> > > >> > > > > > let
> > > >> > > > > > > > the
> > > >> > > > > > > > >>> people know about it – and now Scarf is there. If
> > your
> > > >> > > > > > organization is
> > > >> > > > > > > > >>> getting a lot of value from a piece of open source
> > > >> > software,
> > > >> > > > make
> > > >> > > > > > sure
> > > >> > > > > > > > the
> > > >> > > > > > > > >>> devs know about it."
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>> What kind of edge cases are you thinking about? I
> > > don't
> > > >> > think
> > > >> > > > it
> > > >> > > > > > makes
> > > >> > > > > > > > >>> sense to have "opt-in" at all. As the goal is to
> > > collect
> > > >> > data
> > > >> > > > for
> > > >> > > > > > most
> > > >> > > > > > > > >>> Airflow installations except for those that don't
> > want
> > > >> to
> > > >> > > give
> > > >> > > > > > data,
> > > >> > > > > > > > then
> > > >> > > > > > > > >>> "opt-out" is the only way to maximize it. As long
> as
> > > we
> > > >> > don't
> > > >> > > > > > collect
> > > >> > > > > > > > any
> > > >> > > > > > > > >>> PII data, this is in-compliance as well.
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>> Imagine someone learning Airflow, if they have to
> > > opt-in
> > > >> > via
> > > >> > > a
> > > >> > > > > > config,
> > > >> > > > > > > > >>> they wouldn't even know or care about it, hence us
> > > >> losing
> > > >> > > most
> > > >> > > > of
> > > >> > > > > > the
> > > >> > > > > > > > data.
> > > >> > > > > > > > >>> I understand why some orgs & individuals may want
> to
> > > >> > opt-out.
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>> Scarf Provides tracking pixels (essentially an
> HTML
> > > >> image
> > > >> > > tag)
> > > >> > > > > > that you
> > > >> > > > > > > > >>> can place in your website or product to track
> > visitors
> > > >> to
> > > >> > > that
> > > >> > > > > > URL. If
> > > >> > > > > > > > >>> there were any concerns about Privacy, ASF
> wouldn't
> > > have
> > > >> > > > approved
> > > >> > > > > > it
> > > >> > > > > > > > at all.
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>> A few key details to note about the pixel:
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>>   - No PII is tracked… Scarf does not
> capture/retain
> > > IP
> > > >> > > > > > information…
> > > >> > > > > > > > >>>   this information is discarded by the platform
> upon
> > > >> > > > > > > > processing/aggregating
> > > >> > > > > > > > >>>   - Scarf pixels respect the Do Not Track (DNT)
> > > >> settings of
> > > >> > > > > > browsers -
> > > >> > > > > > > > >>>   these users will not be tracked whatsoever.
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>> All the ASF projects I had listed (whether they
> use
> > > >> Scarf
> > > >> > > > gateway
> > > >> > > > > > or
> > > >> > > > > > > > >>> Scarf pixel in product) are using opt-out.
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>> 1. Short opt-in period before opt-out. Test this
> > > feature
> > > >> > with
> > > >> > > > > > users who
> > > >> > > > > > > > >>>> trust and if it works great - make it public. I
> > think
> > > >> it's
> > > >> > > > wise
> > > >> > > > > to
> > > >> > > > > > > > handle
> > > >> > > > > > > > >>>> edge cases and configure collected data more
> > > >> accurately.
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>> It would be a pixel in the webserver, should
> affect
> > > >> nothing
> > > >> > > at
> > > >> > > > > all
> > > >> > > > > > even
> > > >> > > > > > > > >>> in an air-gapped environment.
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>>> 2. It should not affect anything if access to the
> > > >> internet
> > > >> > > is
> > > >> > > > > > > > restricted
> > > >> > > > > > > > >>>> which is default for many companies.
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>> 100% agreed on the below:
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>>> I think we have a very good blueprint to follow
> > > >> including
> > > >> > at
> > > >> > > > > > least 5
> > > >> > > > > > > > >>>> other
> > > >> > > > > > > > >>>> ASF projects that also passed the review of the
> > > >> > privacy@asf.
> > > >> > > > > And
> > > >> > > > > > > > while I
> > > >> > > > > > > > >>>> understand (and concur) the urge for opt-in by
> > > default
> > > >> > > coming
> > > >> > > > > from
> > > >> > > > > > > > >>>> consumer
> > > >> > > > > > > > >>>> market (where it makes perfect sense) Airflow is
> > not
> > > a
> > > >> > > > consumer
> > > >> > > > > > > > >>>> software and is used in "corporate environment"
> > which
> > > >> has
> > > >> > a
> > > >> > > > > little
> > > >> > > > > > > > >>>> different expectations and broad assumption that
> > the
> > > >> > company
> > > >> > > > can
> > > >> > > > > > make
> > > >> > > > > > > > >>>> decisions on such telemetry on behalf of the
> > > employees
> > > >> > using
> > > >> > > > it.
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>> Couldn't agree more; even though there shouldn't
> we
> > > >> collect
> > > >> > > > > hamper
> > > >> > > > > > > > >>> security (and we should aim to do the same), most
> > > >> security
> > > >> > > > > > concerned
> > > >> > > > > > > > folks
> > > >> > > > > > > > >>> don't just
> > > >> > > > > > > > >>> upgrade, and we can rely on them regarding release
> > > >> notes or
> > > >> > > > > > > > announcements
> > > >> > > > > > > > >>> and we can make it very clear in our announcements
> > > too;
> > > >> and
> > > >> > > in
> > > >> > > > > our
> > > >> > > > > > > > >>> installation guides.
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>> We should assume that those who deploy and upgrade
> > > >> Airflow
> > > >> > -
> > > >> > > > > > actually
> > > >> > > > > > > > read
> > > >> > > > > > > > >>>> and take into account what is written in the
> > release
> > > >> > notes -
> > > >> > > > > > > > especially
> > > >> > > > > > > > >>>> if
> > > >> > > > > > > > >>>> they have security guys breathing their necks,
> > > >> similarly
> > > >> > as
> > > >> > > we
> > > >> > > > > > have to
> > > >> > > > > > > > >>>> assume they follow CVE announcements about
> security
> > > >> issues
> > > >> > > > > fixed.
> > > >> > > > > > If
> > > >> > > > > > > > we
> > > >> > > > > > > > >>>> are very straightforward and out-going about the
> > > >> change,
> > > >> > > > inform
> > > >> > > > > > very
> > > >> > > > > > > > >>>> clearly how to opt-out, I don't see a big problem
> > > with
> > > >> > > > opt-out.
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>> To be clear, the collection of data, or at least
> the
> > > >> data
> > > >> > we
> > > >> > > > > should
> > > >> > > > > > > > >>> gather here should help all the consumers without
> > > >> violating
> > > >> > > > > > anything
> > > >> > > > > > > > >>> regulations. I will quote Maxime's quote in the
> > > use-case
> > > >> > doc
> > > >> > > > [1]
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>> "*Another Form of Contributing*
> > > >> > > > > > > > >>> “I think people often ask ‘how do I contribute to
> > open
> > > >> > > > source?’,
> > > >> > > > > > ‘I've
> > > >> > > > > > > > >>> got to get into the code’, or ‘ I’ve got to be an
> > > >> > engineer.’
> > > >> > > > > > Actually,
> > > >> > > > > > > > the
> > > >> > > > > > > > >>> very simplest thing that you can do is just say,
> ‘my
> > > >> > > > organization
> > > >> > > > > > gets
> > > >> > > > > > > > real
> > > >> > > > > > > > >>> value from this piece of software.’ There are a
> > bunch
> > > of
> > > >> > ways
> > > >> > > > to
> > > >> > > > > > let
> > > >> > > > > > > > the
> > > >> > > > > > > > >>> people know about it – and now Scarf is there. If
> > your
> > > >> > > > > > organization is
> > > >> > > > > > > > >>> getting a lot of value from a piece of open source
> > > >> > software,
> > > >> > > > make
> > > >> > > > > > sure
> > > >> > > > > > > > the
> > > >> > > > > > > > >>> devs know about it.”"
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>> [1]
> > > >> > > > https://about.scarf.sh/post/scarf-case-study-apache-superset
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>> On Sat, 30 Mar 2024 at 14:02, Alexander Shorin <
> > > >> > > > > kxepal@apache.org>
> > > >> > > > > > > > wrote:
> > > >> > > > > > > > >>>
> > > >> > > > > > > > >>>> Hi Jarek!
> > > >> > > > > > > > >>>>
> > > >> > > > > > > > >>>> I understand the reasons for opt-out from a
> project
> > > >> view.
> > > >> > I
> > > >> > > > just
> > > >> > > > > > > > suddenly
> > > >> > > > > > > > >>>> imagined the situation when an upgrade happens
> and
> > > here
> > > >> > > comes
> > > >> > > > > the
> > > >> > > > > > > > data to
> > > >> > > > > > > > >>>> some third party service - that's a view from a
> > user
> > > >> side
> > > >> > of
> > > >> > > > > some
> > > >> > > > > > big
> > > >> > > > > > > > >>>> company.
> > > >> > > > > > > > >>>>
> > > >> > > > > > > > >>>> There could be good alternatives to handle this:
> > > >> > > > > > > > >>>> 1. Short opt-in period before opt-out. Test this
> > > >> feature
> > > >> > > with
> > > >> > > > > > users
> > > >> > > > > > > > who
> > > >> > > > > > > > >>>> trust and if it works great - make it public. I
> > think
> > > >> it's
> > > >> > > > wise
> > > >> > > > > to
> > > >> > > > > > > > handle
> > > >> > > > > > > > >>>> edge cases and configure collected data more
> > > >> accurately.
> > > >> > > > > > > > >>>> 2. Explicitly somehow warn about this feature to
> > make
> > > >> this
> > > >> > > > > > feature not
> > > >> > > > > > > > >>>> get
> > > >> > > > > > > > >>>> unnoticed. Just to reduce possible frustration.
> > > >> > > > > > > > >>>>
> > > >> > > > > > > > >>>> Just a personal thoughts for discussion (:
> > > >> > > > > > > > >>>>
> > > >> > > > > > > > >>>> --
> > > >> > > > > > > > >>>> ,,,^..^,,,
> > > >> > > > > > > > >>>>
> > > >> > > > > > > > >>>> On Sat, Mar 30, 2024 at 4:36 PM Jarek Potiuk <
> > > >> > > > jarek@potiuk.com>
> > > >> > > > > > > > wrote:
> > > >> > > > > > > > >>>>
> > > >> > > > > > > > >>>>> Hello everyone,
> > > >> > > > > > > > >>>>>
> > > >> > > > > > > > >>>>> it has to be:
> > > >> > > > > > > > >>>>>
> > > >> > > > > > > > >>>>> 1. Opt-in by default to not trigger security
> guys
> > > >> about
> > > >> > new
> > > >> > > > > > unplanned
> > > >> > > > > > > > >>>>>> activity after regular upgrade.
> > > >> > > > > > > > >>>>>>
> > > >> > > > > > > > >>>>>
> > > >> > > > > > > > >>>>> That's a very good point about security
> triggering
> > > >> > > Alexander,
> > > >> > > > > > but I
> > > >> > > > > > > > am
> > > >> > > > > > > > >>>> not
> > > >> > > > > > > > >>>>> so sure it means that we "have to" do opt-in.
> > There
> > > >> are
> > > >> > > other
> > > >> > > > > > ways of
> > > >> > > > > > > > >>>>> communicating with the "deployment managers" who
> > > >> install
> > > >> > > and
> > > >> > > > > > upgrade
> > > >> > > > > > > > >>>>> airflow - i.e. release notes. blogs, social
> media
> > of
> > > >> > ours,
> > > >> > > > > slack
> > > >> > > > > > > > >>>>> announcements etc. We have plenty of channels we
> > can
> > > >> use
> > > >> > to
> > > >> > > > > > > > >>>> communicate the
> > > >> > > > > > > > >>>>> change.
> > > >> > > > > > > > >>>>>
> > > >> > > > > > > > >>>>> I think we have a very good blueprint to follow
> > > >> including
> > > >> > > at
> > > >> > > > > > least 5
> > > >> > > > > > > > >>>> other
> > > >> > > > > > > > >>>>> ASF projects that also passed the review of the
> > > >> > > privacy@asf.
> > > >> > > > > And
> > > >> > > > > > > > >>>> while I
> > > >> > > > > > > > >>>>> understand (and concur) the urge for opt-in by
> > > default
> > > >> > > coming
> > > >> > > > > > from
> > > >> > > > > > > > >>>> consumer
> > > >> > > > > > > > >>>>> market (where it makes perfect sense) Airflow is
> > > not a
> > > >> > > > consumer
> > > >> > > > > > > > >>>>> software and is used in "corporate environment"
> > > which
> > > >> > has a
> > > >> > > > > > little
> > > >> > > > > > > > >>>>> different expectations and broad assumption that
> > the
> > > >> > > company
> > > >> > > > > can
> > > >> > > > > > make
> > > >> > > > > > > > >>>>> decisions on such telemetry on behalf of the
> > > employees
> > > >> > > using
> > > >> > > > > it.
> > > >> > > > > > > > >>>>>
> > > >> > > > > > > > >>>>> We should assume that those who deploy and
> upgrade
> > > >> > Airflow
> > > >> > > -
> > > >> > > > > > actually
> > > >> > > > > > > > >>>> read
> > > >> > > > > > > > >>>>> and take into account what is written in the
> > release
> > > >> > notes
> > > >> > > -
> > > >> > > > > > > > >>>> especially if
> > > >> > > > > > > > >>>>> they have security guys breathing their necks,
> > > >> similarly
> > > >> > as
> > > >> > > > we
> > > >> > > > > > have
> > > >> > > > > > > > to
> > > >> > > > > > > > >>>>> assume they follow CVE announcements about
> > security
> > > >> > issues
> > > >> > > > > > fixed. If
> > > >> > > > > > > > we
> > > >> > > > > > > > >>>>> are very straightforward and out-going about the
> > > >> change,
> > > >> > > > inform
> > > >> > > > > > very
> > > >> > > > > > > > >>>>> clearly how to opt-out, I don't see a big
> problem
> > > with
> > > >> > > > opt-out.
> > > >> > > > > > > > >>>>>
> > > >> > > > > > > > >>>>> We should of course check with privacy@a.o (but
> > I'v
> > > >> > spend
> > > >> > > a
> > > >> > > > > good
> > > >> > > > > > > > deal
> > > >> > > > > > > > >>>> of
> > > >> > > > > > > > >>>>> time reading the Superset  and other use case
> and
> > > >> > > explanation
> > > >> > > > > in
> > > >> > > > > > > > >>>> detail to
> > > >> > > > > > > > >>>>> make a better informed decision) - and it looks
> > like
> > > >> they
> > > >> > > > also
> > > >> > > > > > went
> > > >> > > > > > > > >>>> opt-out
> > > >> > > > > > > > >>>>> way and got cleared by privacy@a.o.  And if we
> > > cannot
> > > >> > > reach
> > > >> > > > > > > > >>>> consensus, we
> > > >> > > > > > > > >>>>> should - as usual - make a voting decision on it
> > > >> (because
> > > >> > > > yes,
> > > >> > > > > > it is
> > > >> > > > > > > > an
> > > >> > > > > > > > >>>>> important decision), but - after reading and
> > > >> > understanding
> > > >> > > > why
> > > >> > > > > > others
> > > >> > > > > > > > >>>> also
> > > >> > > > > > > > >>>>> did it - for me personally, opt-out is a good
> > path.
> > > >> > > > > > > > >>>>>
> > > >> > > > > > > > >>>>> Also because it will rather increase the amount
> of
> > > >> data
> > > >> > to
> > > >> > > > > > gather,
> > > >> > > > > > > > and
> > > >> > > > > > > > >>>> in
> > > >> > > > > > > > >>>>> our case - counter intuitively - it will be even
> > > >> better
> > > >> > for
> > > >> > > > > > privacy
> > > >> > > > > > > > and
> > > >> > > > > > > > >>>>> corporate anonymity, because the more data we
> get,
> > > the
> > > >> > more
> > > >> > > > > > difficult
> > > >> > > > > > > > >>>> it
> > > >> > > > > > > > >>>>> will be to get any
> non-statistical/non-aggregated
> > > >> insight
> > > >> > > > from
> > > >> > > > > > it.
> > > >> > > > > > > > >>>> Imagine
> > > >> > > > > > > > >>>>> if only a few corporate users will enable it
> > > >> consciously
> > > >> > -
> > > >> > > > then
> > > >> > > > > > we
> > > >> > > > > > > > >>>> will be
> > > >> > > > > > > > >>>>> able to draw much more conclusions if we find
> out
> > > who
> > > >> > they
> > > >> > > > are,
> > > >> > > > > > than
> > > >> > > > > > > > if
> > > >> > > > > > > > >>>>> everyone has it enabled by default.
> > > >> > > > > > > > >>>>>
> > > >> > > > > > > > >>>>> That's my take on it - but again, it's up to us
> to
> > > >> vote,
> > > >> > > for
> > > >> > > > me
> > > >> > > > > > > > opt-in
> > > >> > > > > > > > >>>> is
> > > >> > > > > > > > >>>>> not "has to", and I am rather for opt-out.
> > > >> > > > > > > > >>>>>
> > > >> > > > > > > > >>>>> J.
> > > >> > > > > > > > >>>>>
> > > >> > > > > > > > >>>>>> Hi all,
> > > >> > > > > > > > >>>>>>
> > > >> > > > > > > > >>>>>>
> > > >> > > > > > > > >>>>>>> I want to propose gathering telemetry for
> > Airflow
> > > >> > > > > > installations.
> > > >> > > > > > > > >>>> As the
> > > >> > > > > > > > >>>>>>> Airflow community, we have been relying
> heavily
> > on
> > > >> the
> > > >> > > > yearly
> > > >> > > > > > > > >>>> Airflow
> > > >> > > > > > > > >>>>>>> Survey and anecdotes to answer a few key
> > questions
> > > >> > about
> > > >> > > > > > Airflow
> > > >> > > > > > > > >>>> usage.
> > > >> > > > > > > > >>>>>>> Questions like the following:
> > > >> > > > > > > > >>>>>>>
> > > >> > > > > > > > >>>>>>>
> > > >> > > > > > > > >>>>>>>   - Which versions of Airflow are people
> > > >> > installing/using
> > > >> > > > now
> > > >> > > > > > > > >>>> (i.e.
> > > >> > > > > > > > >>>>>>>   whether people have primarily made the jump
> > from
> > > >> > > version
> > > >> > > > X
> > > >> > > > > to
> > > >> > > > > > > > >>>>> version
> > > >> > > > > > > > >>>>>> Y)
> > > >> > > > > > > > >>>>>>>   - Which DB is used as the Metadata DB and
> > which
> > > >> > version
> > > >> > > > e.g
> > > >> > > > > > Pg
> > > >> > > > > > > > >>>> 14?
> > > >> > > > > > > > >>>>>>>   - What Python version is being used?
> > > >> > > > > > > > >>>>>>>   - Which Executor is being used?
> > > >> > > > > > > > >>>>>>>   - Approximately how many people out there in
> > the
> > > >> > world
> > > >> > > > are
> > > >> > > > > > > > >>>>> installing
> > > >> > > > > > > > >>>>>>>   Airflow
> > > >> > > > > > > > >>>>>>>
> > > >> > > > > > > > >>>>>>>
> > > >> > > > > > > > >>>>>>> There is a solution that should help answer
> > these
> > > >> > > > questions:
> > > >> > > > > > Scarf
> > > >> > > > > > > > >>>> [1].
> > > >> > > > > > > > >>>>>> The
> > > >> > > > > > > > >>>>>>> ASF already approves Scarf [2][3] and is
> already
> > > >> used
> > > >> > by
> > > >> > > > > other
> > > >> > > > > > ASF
> > > >> > > > > > > > >>>>>>> projects: Superset [4], Dolphin Scheduler [5],
> > > Dubbo
> > > >> > > > > > Kubernetes,
> > > >> > > > > > > > >>>>> DevLake,
> > > >> > > > > > > > >>>>>>> Skywalking as it follows GDPR and other
> > > regulations.
> > > >> > > > > > > > >>>>>>>
> > > >> > > > > > > > >>>>>>> Similar to Superset, we probably can use it as
> > > >> follows:
> > > >> > > > > > > > >>>>>>>
> > > >> > > > > > > > >>>>>>>
> > > >> > > > > > > > >>>>>>>   1. Install the `scarf js` npm package and
> > bundle
> > > >> it
> > > >> > in
> > > >> > > > the
> > > >> > > > > > > > >>>>> Webserver.
> > > >> > > > > > > > >>>>>>>   When the package is downloaded & Airflow
> > > >> webserver is
> > > >> > > > > opened,
> > > >> > > > > > > > >>>>> metadata
> > > >> > > > > > > > >>>>>>> is
> > > >> > > > > > > > >>>>>>>   recorded to the Scarf dashboard.
> > > >> > > > > > > > >>>>>>>   2. Utilize the Scarf Gateway [6], which we
> can
> > > >> use in
> > > >> > > > front
> > > >> > > > > > of
> > > >> > > > > > > > >>>>> docker
> > > >> > > > > > > > >>>>>>>   containers. While it’s possible people go
> > around
> > > >> this
> > > >> > > > > > gateway,
> > > >> > > > > > > > >>>> we
> > > >> > > > > > > > >>>>> can
> > > >> > > > > > > > >>>>>>>   probably configure and encourage most
> traffic
> > to
> > > >> go
> > > >> > > > through
> > > >> > > > > > > > >>>> these
> > > >> > > > > > > > >>>>>>> gateways.
> > > >> > > > > > > > >>>>>>>
> > > >> > > > > > > > >>>>>>> While Scarf does not store any personally
> > > >> identifying
> > > >> > > > > > information
> > > >> > > > > > > > >>>> from
> > > >> > > > > > > > >>>>>> SDK
> > > >> > > > > > > > >>>>>>> telemetry data, it does send various bits of
> > > >> IP-derived
> > > >> > > > > > > > >>>> information as
> > > >> > > > > > > > >>>>>>> outlined here [7]. This data should be made as
> > > >> > > transparent
> > > >> > > > as
> > > >> > > > > > > > >>>> possible
> > > >> > > > > > > > >>>>> by
> > > >> > > > > > > > >>>>>>> granting dashboard access to the Airflow PMC
> and
> > > any
> > > >> > > other
> > > >> > > > > > relevant
> > > >> > > > > > > > >>>>> means
> > > >> > > > > > > > >>>>>>> of sharing/surfacing it that we encounter
> (Town
> > > >> Hall,
> > > >> > > > Slack,
> > > >> > > > > > > > >>>> Newsletter
> > > >> > > > > > > > >>>>>>> etc).
> > > >> > > > > > > > >>>>>>>
> > > >> > > > > > > > >>>>>>> The following case studies are worth reading:
> > > >> > > > > > > > >>>>>>>
> > > >> > > > > > > > >>>>>>>   1.
> > > >> > > > > >
> > https://about.scarf.sh/post/scarf-case-study-apache-superset
> > > >> > > > > > > > >>>>> (From
> > > >> > > > > > > > >>>>>>>   Maxime)
> > > >> > > > > > > > >>>>>>>   2.
> > > >> > > > > > > > >>>>>>>
> > > >> > > > > > > > >>>>>>>
> > > >> > > > > > > > >>>>>>
> > > >> > > > > > > > >>>>>
> > > >> > > > > > > > >>>>
> > > >> > > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://about.scarf.sh/post/haskell-org-bridging-the-gap-between-language-innovation-and-community-understanding
> > > >> > > > > > > > >>>>>>>
> > > >> > > > > > > > >>>>>>> Similar to them, this could help in various
> ways
> > > >> that
> > > >> > > come
> > > >> > > > > with
> > > >> > > > > > > > >>>> using
> > > >> > > > > > > > >>>>>> data
> > > >> > > > > > > > >>>>>>> for decision-making. With clear guidelines on
> > "how
> > > >> to
> > > >> > > > > opt-out"
> > > >> > > > > > > > >>>>>> [8][9][10] &
> > > >> > > > > > > > >>>>>>> "what data is being collected" on the Airflow
> > > >> website,
> > > >> > > this
> > > >> > > > > > can be
> > > >> > > > > > > > >>>>>>> beneficial to the entire community as we would
> > be
> > > >> > making
> > > >> > > > more
> > > >> > > > > > > > >>>> informed
> > > >> > > > > > > > >>>>>>> decisions.
> > > >> > > > > > > > >>>>>>>
> > > >> > > > > > > > >>>>>>> Regards,
> > > >> > > > > > > > >>>>>>> Kaxil
> > > >> > > > > > > > >>>>>>>
> > > >> > > > > > > > >>>>>>>
> > > >> > > > > > > > >>>>>>> [1] https://about.scarf.sh/
> > > >> > > > > > > > >>>>>>> [2]
> > > >> > > > > >
> > > https://privacy.apache.org/policies/privacy-policy-public.html
> > > >> > > > > > > > >>>>>>> [3]
> > > https://privacy.apache.org/faq/committers.html
> > > >> > > > > > > > >>>>>>> [4]
> > > https://github.com/apache/superset/issues/25639
> > > >> > > > > > > > >>>>>>> [5]
> > > >> > > > > > > > >>>>>>>
> > > >> > > > > > > > >>>>>>>
> > > >> > > > > > > > >>>>>>
> > > >> > > > > > > > >>>>>
> > > >> > > > > > > > >>>>
> > > >> > > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/search?q=repo%3Aapache%2Fdolphinscheduler%20scarf.sh&type=code
> > > >> > > > > > > > >>>>>>> [6] https://about.scarf.sh/scarf-gateway
> > > >> > > > > > > > >>>>>>> [7] https://about.scarf.sh/privacy-policy
> > > >> > > > > > > > >>>>>>> [8]
> > > >> > > > > > > > >>>>>>>
> > > >> > > > > > > > >>>>>>>
> > > >> > > > > > > > >>>>>>
> > > >> > > > > > > > >>>>>
> > > >> > > > > > > > >>>>
> > > >> > > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://superset.apache.org/docs/frequently-asked-questions/#does-superset-collect-any-telemetry-data
> > > >> > > > > > > > >>>>>>> [9]
> > > >> > > > > > > > >>>>>>>
> > > >> > > > > > > > >>>>>>>
> > > >> > > > > > > > >>>>>>
> > > >> > > > > > > > >>>>>
> > > >> > > > > > > > >>>>
> > > >> > > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://superset.apache.org/docs/installation/installing-superset-using-docker-compose
> > > >> > > > > > > > >>>>>>> [10]
> > > >> > > > > > > > >>>>>>>
> > > >> > > > > > > > >>>>>>>
> > > >> > > > > > > > >>>>>>
> > > >> > > > > > > > >>>>>
> > > >> > > > > > > > >>>>
> > > >> > > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics
> > > >> > > > > > > > >>>>>>>
> > > >> > > > > > > > >>>>>>
> > > >> > > > > > > > >>>>>
> > > >> > > > > > > > >>>>
> > > >> > > > > > > > >>>
> > > >> > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > > >
> > > >> > > >
> > > >>
> ---------------------------------------------------------------------
> > > >> > > > > > > > To unsubscribe, e-mail:
> > > dev-unsubscribe@airflow.apache.org
> > > >> > > > > > > > For additional commands, e-mail:
> > > >> dev-help@airflow.apache.org
> > > >> > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> >
> > ---------------------------------------------------------------------
> > > >> > > > > > To unsubscribe, e-mail:
> dev-unsubscribe@airflow.apache.org
> > > >> > > > > > For additional commands, e-mail:
> > dev-help@airflow.apache.org
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: [DISCUSS] Proposal for adding Telemetry via Scarf

Posted by Hussein Awala <hu...@awala.fr>.
> Other than that I don't mind it being e.g. optional provider.

I don't think it is possible to implement it in a provider because it is a
js package installed on the webserver; we could implement it as a plugin
(Blueprint), but in this case, the user must make an effort to register it.

It would be better to always install it, and activate it by default, with
the possibility of deactivating it via the environment variable
`SCARF_ANALYTICS=false` (according to the documentation), where if it is
deactivated by default, many users will not activate it even if they don't
mind to report the metrics, but if we enable it by default, only users who
don't want to send metrics will disable it.


On Fri, Apr 5, 2024 at 6:19 PM Michał Modras
<mi...@google.com.invalid> wrote:

> My 2 cents: it must be possible to opt-out, preferably it should be
> possible to deploy Airflow instances without bundling the telemetry library
> dependencies. Other than that I don't mind it being e.g. optional provider.
>
> śr., 3 kwi 2024, 22:42 użytkownik Hussein Awala <hu...@awala.fr>
> napisał:
>
> > > I'd like to propose, that we start with collecting simple data with
> > limited access: to all the PMC members. We can always expand it to
> > Committers and then expand further to make it invite-only or setup
> > exporting it to a DB like Postgres
> > <https://github.com/scarf-sh/scarf-postgres-exporter> and have a
> publicly
> > viewable dashboard.
> >
> > Looks like a good plan; we can discuss the export format when we decide
> to
> > do it.
> >
> > On Wed, Apr 3, 2024 at 7:59 PM Kaxil Naik <ka...@gmail.com> wrote:
> >
> > > Yup, exactly.
> > >
> > > I believe this would definitely help us take early and informed
> > decisions.
> > >> E.g. Had we had this earlier, I believe it would have definitely
> helped
> > us
> > >> more for our past discussions like whether we should continue
> supporting
> > >> MsSQL(
> https://lists.apache.org/thread/r06j306hldg03g2my1pd4nyjxg78b3h4
> > ),
> > >> similarly about the DaskExecutor (
> > >> https://lists.apache.org/thread/ptwjf5g87lyl5476krt91bzfrm96pnb1),
> etc.
> > >>
> > >
> > >
> > > Btw clarifying my own stance on the below; and let me know what you
> > think @Hussein
> > > Awala <hu...@awala.fr> : I'd like to propose, that we start with
> > > collecting simple data with limited access: to all the PMC members. We
> > can
> > > always expand it to Committers and then expand further to make it
> > > invite-only or setup exporting it to a DB like Postgres
> > > <https://github.com/scarf-sh/scarf-postgres-exporter> and have a
> > publicly
> > > viewable dashboard. It would be similar to an iterative software
> > > development approach, since this will be the first time for us, as
> > Airflow
> > > PMC, to add such telemetry. This is of course just my opinion though :)
> > >
> > > Regarding the data, like I had mentioned in the email and I am glad
> > others
> > >> including you are on the same page that the data will be shared with
> all
> > >> PMC members. The point about sharing it via website and newsletter was
> > for
> > >> the community — Airflow users. I don’t think anyone in the community
> > (apart
> > >> from the PMC members) would need raw data. And even if they need it,
> I’d
> > >> say they should put effort and contribute to the Airflow project and
> > become
> > >> PMC members.
> > >> To be clear: this telemetry data should help us, as Airflow PMC, to
> > steer
> > >> some of the decision making based on this data similar to how only PMC
> > has
> > >> a binding vote on the releases. [1] and this is similar to how Apache
> > >> Superset does it too.
> > >> [1]
> > >> https://www.apache.org/dev/pmc.html#what-is-a-pmc
> > >
> > >
> > > On Wed, 3 Apr 2024 at 12:03, Pankaj Koti <pankaj.koti@astronomer.io
> > .invalid>
> > > wrote:
> > >
> > >> +1 to introduce this.
> > >>
> > >> I believe this would definitely help us take early and informed
> > decisions.
> > >> E.g. Had we had this earlier, I believe it would have definitely
> helped
> > us
> > >> more for our past discussions like whether we should continue
> supporting
> > >> MsSQL(
> https://lists.apache.org/thread/r06j306hldg03g2my1pd4nyjxg78b3h4
> > ),
> > >> similarly about the DaskExecutor (
> > >> https://lists.apache.org/thread/ptwjf5g87lyl5476krt91bzfrm96pnb1),
> etc.
> > >>
> > >>
> > >> Best regards,
> > >>
> > >> *Pankaj Koti*
> > >> Senior Software Engineer (Airflow OSS Engineering team)
> > >> Location: Pune, Maharashtra, India
> > >> Timezone: Indian Standard Time (IST)
> > >> Phone: +91 9730079985
> > >>
> > >>
> > >> On Wed, Apr 3, 2024 at 2:44 PM Kaxil Naik <ka...@gmail.com>
> wrote:
> > >>
> > >> > Yup, I had added a link to scarf docs in the original email that
> > >> referenced
> > >> > opting out and we should even add an Airflow config that puts all
> > >> config in
> > >> > a single place. Without it we can’t be compliant to all the policies
> > >> even
> > >> > if we collectively ignore or are unaware of the importance of it.
> > >> >
> > >> > Regarding the data, like I had mentioned in the email and I am glad
> > >> others
> > >> > including you are on the same page that the data will be shared with
> > all
> > >> > PMC members. The point about sharing it via website and newsletter
> was
> > >> for
> > >> > the community — Airflow users. I don’t think anyone in the community
> > >> (apart
> > >> > from the PMC members) would need raw data. And even if they need it,
> > I’d
> > >> > say they should put effort and contribute to the Airflow project and
> > >> become
> > >> > PMC members.
> > >> >
> > >> > To be clear: this telemetry data should help us, as Airflow PMC, to
> > >> steer
> > >> > some of the decision making based on this data similar to how only
> PMC
> > >> has
> > >> > a binding vote on the releases. [1] and this is similar to how
> Apache
> > >> > Superset does it too.
> > >> >
> > >> > [1]
> > >> > https://www.apache.org/dev/pmc.html#what-is-a-pmc
> > >> >
> > >> >
> > >> >
> > >> > On Wed, 3 Apr 2024 at 00:05, Hussein Awala <hu...@awala.fr>
> wrote:
> > >> >
> > >> > > I mentioned opting out just to confirm its importance, and after
> > >> checking
> > >> > > the Scarf documentation it appears to be supported natively by
> > Scarf.
> > >> For
> > >> > > data accessibility, my point was more about raw data, not just
> > >> aggregated
> > >> > > information/insights shared via monthly newsletters, as we do for
> > >> Airflow
> > >> > > annual Survey for example:
> > >> > > https://airflow.apache.org/survey vs
> > >> > >
> > >> > >
> > >> >
> > >>
> >
> https://docs.google.com/forms/d/1wYm6c5Gn379zkg7zD7vcWB-1fCjnOocT0oZm-tjft_Q/viewanalytics
> > >> > > .
> > >> > >
> > >> > > On Tue, Apr 2, 2024 at 2:43 PM Kaxil Naik <ka...@gmail.com>
> > >> wrote:
> > >> > >
> > >> > > > Agreed to both your points Hussein but both the points are
> already
> > >> > > covered
> > >> > > > in my original discussion post - both about opting out and
> > providing
> > >> > data
> > >> > > > to all the PMC members and providing visibility via Monthly
> > >> > newsletters.
> > >> > > Is
> > >> > > > there anything else you propose to discuss that isn’t covered?
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > On Mon, 1 Apr 2024 at 13:21, Hussein Awala <hu...@awala.fr>
> > >> wrote:
> > >> > > >
> > >> > > > > +1 for the idea in general, but there are two main points to
> > >> discuss
> > >> > > > before
> > >> > > > > voting on this:
> > >> > > > >
> > >> > > > > 1. We should provide an option to disable Scarf:
> > >> > > > > As Airflow is not a paid product, we cannot force companies to
> > >> report
> > >> > > > their
> > >> > > > > use of this project. Otherwise, some may choose to create
> their
> > >> own
> > >> > > fork
> > >> > > > > just to disable Scarf.
> > >> > > > >
> > >> > > > > 2. Concerning the exclusivity of access to data:
> > >> > > > > The data collected must either be completely proprietary for
> use
> > >> by
> > >> > PMC
> > >> > > > and
> > >> > > > > ASF, or completely open. Since many companies offer Airflow
> as a
> > >> > > product,
> > >> > > > > it is imperative not to give one company more privileges than
> > >> > others. I
> > >> > > > > raise this point for the principle of equality of opportunity.
> > >> > > > >
> > >> > > > > On Mon, Apr 1, 2024 at 12:35 PM Ankit Chaurasia <
> > >> sunank200@gmail.com
> > >> > >
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > > Big +1 for Scarf.
> > >> > > > > >
> > >> > > > > > Transparency is key, so it's important to be super clear
> about
> > >> > opting
> > >> > > > > > out and what's tracked to avoid spooking anyone about IP
> > stuff.
> > >> > > > > >
> > >> > > > > > Regards
> > >> > > > > > Ankit Chaurasia
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > On Mon, Apr 1, 2024 at 10:18 AM Amogh Desai <
> > >> > > amoghdesai.oss@gmail.com>
> > >> > > > > > wrote:
> > >> > > > > > >
> > >> > > > > > > +1 looks like a good tool which could be super helpful.
> > >> > > > > > >
> > >> > > > > > > * We should have some transparency into the data that is
> > >> > collected
> > >> > > or
> > >> > > > > > sent
> > >> > > > > > > * We should have an option to optionally opt-out
> > >> > > > > > >
> > >> > > > > > > Thanks & Regards,
> > >> > > > > > > Amogh Desai
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > On Sun, Mar 31, 2024 at 7:53 AM Wei Lee <
> > weilee.rx@gmail.com>
> > >> > > wrote:
> > >> > > > > > >
> > >> > > > > > > > +1 to this. It would be really useful. As long as we can
> > opt
> > >> > > out, I
> > >> > > > > > think
> > >> > > > > > > > we’re good.
> > >> > > > > > > >
> > >> > > > > > > > Best,
> > >> > > > > > > > Wei
> > >> > > > > > > >
> > >> > > > > > > > > On Mar 31, 2024, at 12:47 AM, Kaxil Naik <
> > >> > kaxilnaik@gmail.com>
> > >> > > > > > wrote:
> > >> > > > > > > > >
> > >> > > > > > > > > Grammar Correction:
> > >> > > > > > > > >
> > >> > > > > > > > > We should assume that those who deploy and upgrade
> > >> Airflow -
> > >> > > > > actually
> > >> > > > > > > > read
> > >> > > > > > > > >> and take into account what is written in the release
> > >> notes -
> > >> > > > > > especially
> > >> > > > > > > > if
> > >> > > > > > > > >> they have security guys breathing their necks,
> > similarly
> > >> as
> > >> > we
> > >> > > > > have
> > >> > > > > > to
> > >> > > > > > > > >> assume they follow CVE announcements about security
> > >> issues
> > >> > > > fixed.
> > >> > > > > > If we
> > >> > > > > > > > >> are very straightforward and out-going about the
> > change,
> > >> > > inform
> > >> > > > > very
> > >> > > > > > > > >> clearly how to opt-out, I don't see a big problem
> with
> > >> > > opt-out.
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > > I couldn't agree more; even though we shouldn't
> collect
> > >> any
> > >> > > data
> > >> > > > > that
> > >> > > > > > > > > hamper security (and we should aim to do the same),
> most
> > >> > > security
> > >> > > > > > > > concerned
> > >> > > > > > > > > folks don't just upgrade, and we can rely on them
> > >> regarding
> > >> > > > release
> > >> > > > > > notes
> > >> > > > > > > > > or announcements and we can make it very clear in our
> > >> > > > announcements
> > >> > > > > > too;
> > >> > > > > > > > > and in our installation guides.
> > >> > > > > > > > >
> > >> > > > > > > > > On Sat, 30 Mar 2024 at 16:47, Kaxil Naik <
> > >> > kaxilnaik@gmail.com>
> > >> > > > > > wrote:
> > >> > > > > > > > >
> > >> > > > > > > > >> Grammar crrection:
> > >> > > > > > > > >>
> > >> > > > > > > > >>
> > >> > > > > > > > >> On Sat, 30 Mar 2024 at 16:43, Kaxil Naik <
> > >> > kaxilnaik@gmail.com
> > >> > > >
> > >> > > > > > wrote:
> > >> > > > > > > > >>
> > >> > > > > > > > >>> Have this at the end of the email too: but if folks
> > >> don't
> > >> > > read
> > >> > > > > > until
> > >> > > > > > > > the
> > >> > > > > > > > >>> end and quoting Maxime from the use-case blog[1]:
> > >> > > > > > > > >>>
> > >> > > > > > > > >>> "I think people often ask ‘how do I contribute to
> open
> > >> > > > source?’,
> > >> > > > > > ‘I've
> > >> > > > > > > > >>> got to get into the code’, or ‘ I’ve got to be an
> > >> > engineer.’
> > >> > > > > > Actually,
> > >> > > > > > > > the
> > >> > > > > > > > >>> very simplest thing that you can do is just say, ‘my
> > >> > > > organization
> > >> > > > > > gets
> > >> > > > > > > > real
> > >> > > > > > > > >>> value from this piece of software.’ There are a
> bunch
> > of
> > >> > ways
> > >> > > > to
> > >> > > > > > let
> > >> > > > > > > > the
> > >> > > > > > > > >>> people know about it – and now Scarf is there. If
> your
> > >> > > > > > organization is
> > >> > > > > > > > >>> getting a lot of value from a piece of open source
> > >> > software,
> > >> > > > make
> > >> > > > > > sure
> > >> > > > > > > > the
> > >> > > > > > > > >>> devs know about it."
> > >> > > > > > > > >>>
> > >> > > > > > > > >>> What kind of edge cases are you thinking about? I
> > don't
> > >> > think
> > >> > > > it
> > >> > > > > > makes
> > >> > > > > > > > >>> sense to have "opt-in" at all. As the goal is to
> > collect
> > >> > data
> > >> > > > for
> > >> > > > > > most
> > >> > > > > > > > >>> Airflow installations except for those that don't
> want
> > >> to
> > >> > > give
> > >> > > > > > data,
> > >> > > > > > > > then
> > >> > > > > > > > >>> "opt-out" is the only way to maximize it. As long as
> > we
> > >> > don't
> > >> > > > > > collect
> > >> > > > > > > > any
> > >> > > > > > > > >>> PII data, this is in-compliance as well.
> > >> > > > > > > > >>>
> > >> > > > > > > > >>> Imagine someone learning Airflow, if they have to
> > opt-in
> > >> > via
> > >> > > a
> > >> > > > > > config,
> > >> > > > > > > > >>> they wouldn't even know or care about it, hence us
> > >> losing
> > >> > > most
> > >> > > > of
> > >> > > > > > the
> > >> > > > > > > > data.
> > >> > > > > > > > >>> I understand why some orgs & individuals may want to
> > >> > opt-out.
> > >> > > > > > > > >>>
> > >> > > > > > > > >>> Scarf Provides tracking pixels (essentially an HTML
> > >> image
> > >> > > tag)
> > >> > > > > > that you
> > >> > > > > > > > >>> can place in your website or product to track
> visitors
> > >> to
> > >> > > that
> > >> > > > > > URL. If
> > >> > > > > > > > >>> there were any concerns about Privacy, ASF wouldn't
> > have
> > >> > > > approved
> > >> > > > > > it
> > >> > > > > > > > at all.
> > >> > > > > > > > >>>
> > >> > > > > > > > >>> A few key details to note about the pixel:
> > >> > > > > > > > >>>
> > >> > > > > > > > >>>
> > >> > > > > > > > >>>   - No PII is tracked… Scarf does not capture/retain
> > IP
> > >> > > > > > information…
> > >> > > > > > > > >>>   this information is discarded by the platform upon
> > >> > > > > > > > processing/aggregating
> > >> > > > > > > > >>>   - Scarf pixels respect the Do Not Track (DNT)
> > >> settings of
> > >> > > > > > browsers -
> > >> > > > > > > > >>>   these users will not be tracked whatsoever.
> > >> > > > > > > > >>>
> > >> > > > > > > > >>>
> > >> > > > > > > > >>> All the ASF projects I had listed (whether they use
> > >> Scarf
> > >> > > > gateway
> > >> > > > > > or
> > >> > > > > > > > >>> Scarf pixel in product) are using opt-out.
> > >> > > > > > > > >>>
> > >> > > > > > > > >>> 1. Short opt-in period before opt-out. Test this
> > feature
> > >> > with
> > >> > > > > > users who
> > >> > > > > > > > >>>> trust and if it works great - make it public. I
> think
> > >> it's
> > >> > > > wise
> > >> > > > > to
> > >> > > > > > > > handle
> > >> > > > > > > > >>>> edge cases and configure collected data more
> > >> accurately.
> > >> > > > > > > > >>>
> > >> > > > > > > > >>>
> > >> > > > > > > > >>>
> > >> > > > > > > > >>> It would be a pixel in the webserver, should affect
> > >> nothing
> > >> > > at
> > >> > > > > all
> > >> > > > > > even
> > >> > > > > > > > >>> in an air-gapped environment.
> > >> > > > > > > > >>>
> > >> > > > > > > > >>>> 2. It should not affect anything if access to the
> > >> internet
> > >> > > is
> > >> > > > > > > > restricted
> > >> > > > > > > > >>>> which is default for many companies.
> > >> > > > > > > > >>>
> > >> > > > > > > > >>>
> > >> > > > > > > > >>>
> > >> > > > > > > > >>> 100% agreed on the below:
> > >> > > > > > > > >>>
> > >> > > > > > > > >>>> I think we have a very good blueprint to follow
> > >> including
> > >> > at
> > >> > > > > > least 5
> > >> > > > > > > > >>>> other
> > >> > > > > > > > >>>> ASF projects that also passed the review of the
> > >> > privacy@asf.
> > >> > > > > And
> > >> > > > > > > > while I
> > >> > > > > > > > >>>> understand (and concur) the urge for opt-in by
> > default
> > >> > > coming
> > >> > > > > from
> > >> > > > > > > > >>>> consumer
> > >> > > > > > > > >>>> market (where it makes perfect sense) Airflow is
> not
> > a
> > >> > > > consumer
> > >> > > > > > > > >>>> software and is used in "corporate environment"
> which
> > >> has
> > >> > a
> > >> > > > > little
> > >> > > > > > > > >>>> different expectations and broad assumption that
> the
> > >> > company
> > >> > > > can
> > >> > > > > > make
> > >> > > > > > > > >>>> decisions on such telemetry on behalf of the
> > employees
> > >> > using
> > >> > > > it.
> > >> > > > > > > > >>>
> > >> > > > > > > > >>>
> > >> > > > > > > > >>> Couldn't agree more; even though there shouldn't we
> > >> collect
> > >> > > > > hamper
> > >> > > > > > > > >>> security (and we should aim to do the same), most
> > >> security
> > >> > > > > > concerned
> > >> > > > > > > > folks
> > >> > > > > > > > >>> don't just
> > >> > > > > > > > >>> upgrade, and we can rely on them regarding release
> > >> notes or
> > >> > > > > > > > announcements
> > >> > > > > > > > >>> and we can make it very clear in our announcements
> > too;
> > >> and
> > >> > > in
> > >> > > > > our
> > >> > > > > > > > >>> installation guides.
> > >> > > > > > > > >>>
> > >> > > > > > > > >>> We should assume that those who deploy and upgrade
> > >> Airflow
> > >> > -
> > >> > > > > > actually
> > >> > > > > > > > read
> > >> > > > > > > > >>>> and take into account what is written in the
> release
> > >> > notes -
> > >> > > > > > > > especially
> > >> > > > > > > > >>>> if
> > >> > > > > > > > >>>> they have security guys breathing their necks,
> > >> similarly
> > >> > as
> > >> > > we
> > >> > > > > > have to
> > >> > > > > > > > >>>> assume they follow CVE announcements about security
> > >> issues
> > >> > > > > fixed.
> > >> > > > > > If
> > >> > > > > > > > we
> > >> > > > > > > > >>>> are very straightforward and out-going about the
> > >> change,
> > >> > > > inform
> > >> > > > > > very
> > >> > > > > > > > >>>> clearly how to opt-out, I don't see a big problem
> > with
> > >> > > > opt-out.
> > >> > > > > > > > >>>
> > >> > > > > > > > >>>
> > >> > > > > > > > >>>
> > >> > > > > > > > >>> To be clear, the collection of data, or at least the
> > >> data
> > >> > we
> > >> > > > > should
> > >> > > > > > > > >>> gather here should help all the consumers without
> > >> violating
> > >> > > > > > anything
> > >> > > > > > > > >>> regulations. I will quote Maxime's quote in the
> > use-case
> > >> > doc
> > >> > > > [1]
> > >> > > > > > > > >>>
> > >> > > > > > > > >>> "*Another Form of Contributing*
> > >> > > > > > > > >>> “I think people often ask ‘how do I contribute to
> open
> > >> > > > source?’,
> > >> > > > > > ‘I've
> > >> > > > > > > > >>> got to get into the code’, or ‘ I’ve got to be an
> > >> > engineer.’
> > >> > > > > > Actually,
> > >> > > > > > > > the
> > >> > > > > > > > >>> very simplest thing that you can do is just say, ‘my
> > >> > > > organization
> > >> > > > > > gets
> > >> > > > > > > > real
> > >> > > > > > > > >>> value from this piece of software.’ There are a
> bunch
> > of
> > >> > ways
> > >> > > > to
> > >> > > > > > let
> > >> > > > > > > > the
> > >> > > > > > > > >>> people know about it – and now Scarf is there. If
> your
> > >> > > > > > organization is
> > >> > > > > > > > >>> getting a lot of value from a piece of open source
> > >> > software,
> > >> > > > make
> > >> > > > > > sure
> > >> > > > > > > > the
> > >> > > > > > > > >>> devs know about it.”"
> > >> > > > > > > > >>>
> > >> > > > > > > > >>>
> > >> > > > > > > > >>> [1]
> > >> > > > https://about.scarf.sh/post/scarf-case-study-apache-superset
> > >> > > > > > > > >>>
> > >> > > > > > > > >>> On Sat, 30 Mar 2024 at 14:02, Alexander Shorin <
> > >> > > > > kxepal@apache.org>
> > >> > > > > > > > wrote:
> > >> > > > > > > > >>>
> > >> > > > > > > > >>>> Hi Jarek!
> > >> > > > > > > > >>>>
> > >> > > > > > > > >>>> I understand the reasons for opt-out from a project
> > >> view.
> > >> > I
> > >> > > > just
> > >> > > > > > > > suddenly
> > >> > > > > > > > >>>> imagined the situation when an upgrade happens and
> > here
> > >> > > comes
> > >> > > > > the
> > >> > > > > > > > data to
> > >> > > > > > > > >>>> some third party service - that's a view from a
> user
> > >> side
> > >> > of
> > >> > > > > some
> > >> > > > > > big
> > >> > > > > > > > >>>> company.
> > >> > > > > > > > >>>>
> > >> > > > > > > > >>>> There could be good alternatives to handle this:
> > >> > > > > > > > >>>> 1. Short opt-in period before opt-out. Test this
> > >> feature
> > >> > > with
> > >> > > > > > users
> > >> > > > > > > > who
> > >> > > > > > > > >>>> trust and if it works great - make it public. I
> think
> > >> it's
> > >> > > > wise
> > >> > > > > to
> > >> > > > > > > > handle
> > >> > > > > > > > >>>> edge cases and configure collected data more
> > >> accurately.
> > >> > > > > > > > >>>> 2. Explicitly somehow warn about this feature to
> make
> > >> this
> > >> > > > > > feature not
> > >> > > > > > > > >>>> get
> > >> > > > > > > > >>>> unnoticed. Just to reduce possible frustration.
> > >> > > > > > > > >>>>
> > >> > > > > > > > >>>> Just a personal thoughts for discussion (:
> > >> > > > > > > > >>>>
> > >> > > > > > > > >>>> --
> > >> > > > > > > > >>>> ,,,^..^,,,
> > >> > > > > > > > >>>>
> > >> > > > > > > > >>>> On Sat, Mar 30, 2024 at 4:36 PM Jarek Potiuk <
> > >> > > > jarek@potiuk.com>
> > >> > > > > > > > wrote:
> > >> > > > > > > > >>>>
> > >> > > > > > > > >>>>> Hello everyone,
> > >> > > > > > > > >>>>>
> > >> > > > > > > > >>>>> it has to be:
> > >> > > > > > > > >>>>>
> > >> > > > > > > > >>>>> 1. Opt-in by default to not trigger security guys
> > >> about
> > >> > new
> > >> > > > > > unplanned
> > >> > > > > > > > >>>>>> activity after regular upgrade.
> > >> > > > > > > > >>>>>>
> > >> > > > > > > > >>>>>
> > >> > > > > > > > >>>>> That's a very good point about security triggering
> > >> > > Alexander,
> > >> > > > > > but I
> > >> > > > > > > > am
> > >> > > > > > > > >>>> not
> > >> > > > > > > > >>>>> so sure it means that we "have to" do opt-in.
> There
> > >> are
> > >> > > other
> > >> > > > > > ways of
> > >> > > > > > > > >>>>> communicating with the "deployment managers" who
> > >> install
> > >> > > and
> > >> > > > > > upgrade
> > >> > > > > > > > >>>>> airflow - i.e. release notes. blogs, social media
> of
> > >> > ours,
> > >> > > > > slack
> > >> > > > > > > > >>>>> announcements etc. We have plenty of channels we
> can
> > >> use
> > >> > to
> > >> > > > > > > > >>>> communicate the
> > >> > > > > > > > >>>>> change.
> > >> > > > > > > > >>>>>
> > >> > > > > > > > >>>>> I think we have a very good blueprint to follow
> > >> including
> > >> > > at
> > >> > > > > > least 5
> > >> > > > > > > > >>>> other
> > >> > > > > > > > >>>>> ASF projects that also passed the review of the
> > >> > > privacy@asf.
> > >> > > > > And
> > >> > > > > > > > >>>> while I
> > >> > > > > > > > >>>>> understand (and concur) the urge for opt-in by
> > default
> > >> > > coming
> > >> > > > > > from
> > >> > > > > > > > >>>> consumer
> > >> > > > > > > > >>>>> market (where it makes perfect sense) Airflow is
> > not a
> > >> > > > consumer
> > >> > > > > > > > >>>>> software and is used in "corporate environment"
> > which
> > >> > has a
> > >> > > > > > little
> > >> > > > > > > > >>>>> different expectations and broad assumption that
> the
> > >> > > company
> > >> > > > > can
> > >> > > > > > make
> > >> > > > > > > > >>>>> decisions on such telemetry on behalf of the
> > employees
> > >> > > using
> > >> > > > > it.
> > >> > > > > > > > >>>>>
> > >> > > > > > > > >>>>> We should assume that those who deploy and upgrade
> > >> > Airflow
> > >> > > -
> > >> > > > > > actually
> > >> > > > > > > > >>>> read
> > >> > > > > > > > >>>>> and take into account what is written in the
> release
> > >> > notes
> > >> > > -
> > >> > > > > > > > >>>> especially if
> > >> > > > > > > > >>>>> they have security guys breathing their necks,
> > >> similarly
> > >> > as
> > >> > > > we
> > >> > > > > > have
> > >> > > > > > > > to
> > >> > > > > > > > >>>>> assume they follow CVE announcements about
> security
> > >> > issues
> > >> > > > > > fixed. If
> > >> > > > > > > > we
> > >> > > > > > > > >>>>> are very straightforward and out-going about the
> > >> change,
> > >> > > > inform
> > >> > > > > > very
> > >> > > > > > > > >>>>> clearly how to opt-out, I don't see a big problem
> > with
> > >> > > > opt-out.
> > >> > > > > > > > >>>>>
> > >> > > > > > > > >>>>> We should of course check with privacy@a.o (but
> I'v
> > >> > spend
> > >> > > a
> > >> > > > > good
> > >> > > > > > > > deal
> > >> > > > > > > > >>>> of
> > >> > > > > > > > >>>>> time reading the Superset  and other use case and
> > >> > > explanation
> > >> > > > > in
> > >> > > > > > > > >>>> detail to
> > >> > > > > > > > >>>>> make a better informed decision) - and it looks
> like
> > >> they
> > >> > > > also
> > >> > > > > > went
> > >> > > > > > > > >>>> opt-out
> > >> > > > > > > > >>>>> way and got cleared by privacy@a.o.  And if we
> > cannot
> > >> > > reach
> > >> > > > > > > > >>>> consensus, we
> > >> > > > > > > > >>>>> should - as usual - make a voting decision on it
> > >> (because
> > >> > > > yes,
> > >> > > > > > it is
> > >> > > > > > > > an
> > >> > > > > > > > >>>>> important decision), but - after reading and
> > >> > understanding
> > >> > > > why
> > >> > > > > > others
> > >> > > > > > > > >>>> also
> > >> > > > > > > > >>>>> did it - for me personally, opt-out is a good
> path.
> > >> > > > > > > > >>>>>
> > >> > > > > > > > >>>>> Also because it will rather increase the amount of
> > >> data
> > >> > to
> > >> > > > > > gather,
> > >> > > > > > > > and
> > >> > > > > > > > >>>> in
> > >> > > > > > > > >>>>> our case - counter intuitively - it will be even
> > >> better
> > >> > for
> > >> > > > > > privacy
> > >> > > > > > > > and
> > >> > > > > > > > >>>>> corporate anonymity, because the more data we get,
> > the
> > >> > more
> > >> > > > > > difficult
> > >> > > > > > > > >>>> it
> > >> > > > > > > > >>>>> will be to get any non-statistical/non-aggregated
> > >> insight
> > >> > > > from
> > >> > > > > > it.
> > >> > > > > > > > >>>> Imagine
> > >> > > > > > > > >>>>> if only a few corporate users will enable it
> > >> consciously
> > >> > -
> > >> > > > then
> > >> > > > > > we
> > >> > > > > > > > >>>> will be
> > >> > > > > > > > >>>>> able to draw much more conclusions if we find out
> > who
> > >> > they
> > >> > > > are,
> > >> > > > > > than
> > >> > > > > > > > if
> > >> > > > > > > > >>>>> everyone has it enabled by default.
> > >> > > > > > > > >>>>>
> > >> > > > > > > > >>>>> That's my take on it - but again, it's up to us to
> > >> vote,
> > >> > > for
> > >> > > > me
> > >> > > > > > > > opt-in
> > >> > > > > > > > >>>> is
> > >> > > > > > > > >>>>> not "has to", and I am rather for opt-out.
> > >> > > > > > > > >>>>>
> > >> > > > > > > > >>>>> J.
> > >> > > > > > > > >>>>>
> > >> > > > > > > > >>>>>> Hi all,
> > >> > > > > > > > >>>>>>
> > >> > > > > > > > >>>>>>
> > >> > > > > > > > >>>>>>> I want to propose gathering telemetry for
> Airflow
> > >> > > > > > installations.
> > >> > > > > > > > >>>> As the
> > >> > > > > > > > >>>>>>> Airflow community, we have been relying heavily
> on
> > >> the
> > >> > > > yearly
> > >> > > > > > > > >>>> Airflow
> > >> > > > > > > > >>>>>>> Survey and anecdotes to answer a few key
> questions
> > >> > about
> > >> > > > > > Airflow
> > >> > > > > > > > >>>> usage.
> > >> > > > > > > > >>>>>>> Questions like the following:
> > >> > > > > > > > >>>>>>>
> > >> > > > > > > > >>>>>>>
> > >> > > > > > > > >>>>>>>   - Which versions of Airflow are people
> > >> > installing/using
> > >> > > > now
> > >> > > > > > > > >>>> (i.e.
> > >> > > > > > > > >>>>>>>   whether people have primarily made the jump
> from
> > >> > > version
> > >> > > > X
> > >> > > > > to
> > >> > > > > > > > >>>>> version
> > >> > > > > > > > >>>>>> Y)
> > >> > > > > > > > >>>>>>>   - Which DB is used as the Metadata DB and
> which
> > >> > version
> > >> > > > e.g
> > >> > > > > > Pg
> > >> > > > > > > > >>>> 14?
> > >> > > > > > > > >>>>>>>   - What Python version is being used?
> > >> > > > > > > > >>>>>>>   - Which Executor is being used?
> > >> > > > > > > > >>>>>>>   - Approximately how many people out there in
> the
> > >> > world
> > >> > > > are
> > >> > > > > > > > >>>>> installing
> > >> > > > > > > > >>>>>>>   Airflow
> > >> > > > > > > > >>>>>>>
> > >> > > > > > > > >>>>>>>
> > >> > > > > > > > >>>>>>> There is a solution that should help answer
> these
> > >> > > > questions:
> > >> > > > > > Scarf
> > >> > > > > > > > >>>> [1].
> > >> > > > > > > > >>>>>> The
> > >> > > > > > > > >>>>>>> ASF already approves Scarf [2][3] and is already
> > >> used
> > >> > by
> > >> > > > > other
> > >> > > > > > ASF
> > >> > > > > > > > >>>>>>> projects: Superset [4], Dolphin Scheduler [5],
> > Dubbo
> > >> > > > > > Kubernetes,
> > >> > > > > > > > >>>>> DevLake,
> > >> > > > > > > > >>>>>>> Skywalking as it follows GDPR and other
> > regulations.
> > >> > > > > > > > >>>>>>>
> > >> > > > > > > > >>>>>>> Similar to Superset, we probably can use it as
> > >> follows:
> > >> > > > > > > > >>>>>>>
> > >> > > > > > > > >>>>>>>
> > >> > > > > > > > >>>>>>>   1. Install the `scarf js` npm package and
> bundle
> > >> it
> > >> > in
> > >> > > > the
> > >> > > > > > > > >>>>> Webserver.
> > >> > > > > > > > >>>>>>>   When the package is downloaded & Airflow
> > >> webserver is
> > >> > > > > opened,
> > >> > > > > > > > >>>>> metadata
> > >> > > > > > > > >>>>>>> is
> > >> > > > > > > > >>>>>>>   recorded to the Scarf dashboard.
> > >> > > > > > > > >>>>>>>   2. Utilize the Scarf Gateway [6], which we can
> > >> use in
> > >> > > > front
> > >> > > > > > of
> > >> > > > > > > > >>>>> docker
> > >> > > > > > > > >>>>>>>   containers. While it’s possible people go
> around
> > >> this
> > >> > > > > > gateway,
> > >> > > > > > > > >>>> we
> > >> > > > > > > > >>>>> can
> > >> > > > > > > > >>>>>>>   probably configure and encourage most traffic
> to
> > >> go
> > >> > > > through
> > >> > > > > > > > >>>> these
> > >> > > > > > > > >>>>>>> gateways.
> > >> > > > > > > > >>>>>>>
> > >> > > > > > > > >>>>>>> While Scarf does not store any personally
> > >> identifying
> > >> > > > > > information
> > >> > > > > > > > >>>> from
> > >> > > > > > > > >>>>>> SDK
> > >> > > > > > > > >>>>>>> telemetry data, it does send various bits of
> > >> IP-derived
> > >> > > > > > > > >>>> information as
> > >> > > > > > > > >>>>>>> outlined here [7]. This data should be made as
> > >> > > transparent
> > >> > > > as
> > >> > > > > > > > >>>> possible
> > >> > > > > > > > >>>>> by
> > >> > > > > > > > >>>>>>> granting dashboard access to the Airflow PMC and
> > any
> > >> > > other
> > >> > > > > > relevant
> > >> > > > > > > > >>>>> means
> > >> > > > > > > > >>>>>>> of sharing/surfacing it that we encounter (Town
> > >> Hall,
> > >> > > > Slack,
> > >> > > > > > > > >>>> Newsletter
> > >> > > > > > > > >>>>>>> etc).
> > >> > > > > > > > >>>>>>>
> > >> > > > > > > > >>>>>>> The following case studies are worth reading:
> > >> > > > > > > > >>>>>>>
> > >> > > > > > > > >>>>>>>   1.
> > >> > > > > >
> https://about.scarf.sh/post/scarf-case-study-apache-superset
> > >> > > > > > > > >>>>> (From
> > >> > > > > > > > >>>>>>>   Maxime)
> > >> > > > > > > > >>>>>>>   2.
> > >> > > > > > > > >>>>>>>
> > >> > > > > > > > >>>>>>>
> > >> > > > > > > > >>>>>>
> > >> > > > > > > > >>>>>
> > >> > > > > > > > >>>>
> > >> > > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://about.scarf.sh/post/haskell-org-bridging-the-gap-between-language-innovation-and-community-understanding
> > >> > > > > > > > >>>>>>>
> > >> > > > > > > > >>>>>>> Similar to them, this could help in various ways
> > >> that
> > >> > > come
> > >> > > > > with
> > >> > > > > > > > >>>> using
> > >> > > > > > > > >>>>>> data
> > >> > > > > > > > >>>>>>> for decision-making. With clear guidelines on
> "how
> > >> to
> > >> > > > > opt-out"
> > >> > > > > > > > >>>>>> [8][9][10] &
> > >> > > > > > > > >>>>>>> "what data is being collected" on the Airflow
> > >> website,
> > >> > > this
> > >> > > > > > can be
> > >> > > > > > > > >>>>>>> beneficial to the entire community as we would
> be
> > >> > making
> > >> > > > more
> > >> > > > > > > > >>>> informed
> > >> > > > > > > > >>>>>>> decisions.
> > >> > > > > > > > >>>>>>>
> > >> > > > > > > > >>>>>>> Regards,
> > >> > > > > > > > >>>>>>> Kaxil
> > >> > > > > > > > >>>>>>>
> > >> > > > > > > > >>>>>>>
> > >> > > > > > > > >>>>>>> [1] https://about.scarf.sh/
> > >> > > > > > > > >>>>>>> [2]
> > >> > > > > >
> > https://privacy.apache.org/policies/privacy-policy-public.html
> > >> > > > > > > > >>>>>>> [3]
> > https://privacy.apache.org/faq/committers.html
> > >> > > > > > > > >>>>>>> [4]
> > https://github.com/apache/superset/issues/25639
> > >> > > > > > > > >>>>>>> [5]
> > >> > > > > > > > >>>>>>>
> > >> > > > > > > > >>>>>>>
> > >> > > > > > > > >>>>>>
> > >> > > > > > > > >>>>>
> > >> > > > > > > > >>>>
> > >> > > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/search?q=repo%3Aapache%2Fdolphinscheduler%20scarf.sh&type=code
> > >> > > > > > > > >>>>>>> [6] https://about.scarf.sh/scarf-gateway
> > >> > > > > > > > >>>>>>> [7] https://about.scarf.sh/privacy-policy
> > >> > > > > > > > >>>>>>> [8]
> > >> > > > > > > > >>>>>>>
> > >> > > > > > > > >>>>>>>
> > >> > > > > > > > >>>>>>
> > >> > > > > > > > >>>>>
> > >> > > > > > > > >>>>
> > >> > > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://superset.apache.org/docs/frequently-asked-questions/#does-superset-collect-any-telemetry-data
> > >> > > > > > > > >>>>>>> [9]
> > >> > > > > > > > >>>>>>>
> > >> > > > > > > > >>>>>>>
> > >> > > > > > > > >>>>>>
> > >> > > > > > > > >>>>>
> > >> > > > > > > > >>>>
> > >> > > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://superset.apache.org/docs/installation/installing-superset-using-docker-compose
> > >> > > > > > > > >>>>>>> [10]
> > >> > > > > > > > >>>>>>>
> > >> > > > > > > > >>>>>>>
> > >> > > > > > > > >>>>>>
> > >> > > > > > > > >>>>>
> > >> > > > > > > > >>>>
> > >> > > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics
> > >> > > > > > > > >>>>>>>
> > >> > > > > > > > >>>>>>
> > >> > > > > > > > >>>>>
> > >> > > > > > > > >>>>
> > >> > > > > > > > >>>
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > >
> > >> ---------------------------------------------------------------------
> > >> > > > > > > > To unsubscribe, e-mail:
> > dev-unsubscribe@airflow.apache.org
> > >> > > > > > > > For additional commands, e-mail:
> > >> dev-help@airflow.apache.org
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> >
> ---------------------------------------------------------------------
> > >> > > > > > To unsubscribe, e-mail: dev-unsubscribe@airflow.apache.org
> > >> > > > > > For additional commands, e-mail:
> dev-help@airflow.apache.org
> > >> > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> >
>

Re: [DISCUSS] Proposal for adding Telemetry via Scarf

Posted by Michał Modras <mi...@google.com.INVALID>.
My 2 cents: it must be possible to opt-out, preferably it should be
possible to deploy Airflow instances without bundling the telemetry library
dependencies. Other than that I don't mind it being e.g. optional provider.

śr., 3 kwi 2024, 22:42 użytkownik Hussein Awala <hu...@awala.fr> napisał:

> > I'd like to propose, that we start with collecting simple data with
> limited access: to all the PMC members. We can always expand it to
> Committers and then expand further to make it invite-only or setup
> exporting it to a DB like Postgres
> <https://github.com/scarf-sh/scarf-postgres-exporter> and have a publicly
> viewable dashboard.
>
> Looks like a good plan; we can discuss the export format when we decide to
> do it.
>
> On Wed, Apr 3, 2024 at 7:59 PM Kaxil Naik <ka...@gmail.com> wrote:
>
> > Yup, exactly.
> >
> > I believe this would definitely help us take early and informed
> decisions.
> >> E.g. Had we had this earlier, I believe it would have definitely helped
> us
> >> more for our past discussions like whether we should continue supporting
> >> MsSQL(https://lists.apache.org/thread/r06j306hldg03g2my1pd4nyjxg78b3h4
> ),
> >> similarly about the DaskExecutor (
> >> https://lists.apache.org/thread/ptwjf5g87lyl5476krt91bzfrm96pnb1), etc.
> >>
> >
> >
> > Btw clarifying my own stance on the below; and let me know what you
> think @Hussein
> > Awala <hu...@awala.fr> : I'd like to propose, that we start with
> > collecting simple data with limited access: to all the PMC members. We
> can
> > always expand it to Committers and then expand further to make it
> > invite-only or setup exporting it to a DB like Postgres
> > <https://github.com/scarf-sh/scarf-postgres-exporter> and have a
> publicly
> > viewable dashboard. It would be similar to an iterative software
> > development approach, since this will be the first time for us, as
> Airflow
> > PMC, to add such telemetry. This is of course just my opinion though :)
> >
> > Regarding the data, like I had mentioned in the email and I am glad
> others
> >> including you are on the same page that the data will be shared with all
> >> PMC members. The point about sharing it via website and newsletter was
> for
> >> the community — Airflow users. I don’t think anyone in the community
> (apart
> >> from the PMC members) would need raw data. And even if they need it, I’d
> >> say they should put effort and contribute to the Airflow project and
> become
> >> PMC members.
> >> To be clear: this telemetry data should help us, as Airflow PMC, to
> steer
> >> some of the decision making based on this data similar to how only PMC
> has
> >> a binding vote on the releases. [1] and this is similar to how Apache
> >> Superset does it too.
> >> [1]
> >> https://www.apache.org/dev/pmc.html#what-is-a-pmc
> >
> >
> > On Wed, 3 Apr 2024 at 12:03, Pankaj Koti <pankaj.koti@astronomer.io
> .invalid>
> > wrote:
> >
> >> +1 to introduce this.
> >>
> >> I believe this would definitely help us take early and informed
> decisions.
> >> E.g. Had we had this earlier, I believe it would have definitely helped
> us
> >> more for our past discussions like whether we should continue supporting
> >> MsSQL(https://lists.apache.org/thread/r06j306hldg03g2my1pd4nyjxg78b3h4
> ),
> >> similarly about the DaskExecutor (
> >> https://lists.apache.org/thread/ptwjf5g87lyl5476krt91bzfrm96pnb1), etc.
> >>
> >>
> >> Best regards,
> >>
> >> *Pankaj Koti*
> >> Senior Software Engineer (Airflow OSS Engineering team)
> >> Location: Pune, Maharashtra, India
> >> Timezone: Indian Standard Time (IST)
> >> Phone: +91 9730079985
> >>
> >>
> >> On Wed, Apr 3, 2024 at 2:44 PM Kaxil Naik <ka...@gmail.com> wrote:
> >>
> >> > Yup, I had added a link to scarf docs in the original email that
> >> referenced
> >> > opting out and we should even add an Airflow config that puts all
> >> config in
> >> > a single place. Without it we can’t be compliant to all the policies
> >> even
> >> > if we collectively ignore or are unaware of the importance of it.
> >> >
> >> > Regarding the data, like I had mentioned in the email and I am glad
> >> others
> >> > including you are on the same page that the data will be shared with
> all
> >> > PMC members. The point about sharing it via website and newsletter was
> >> for
> >> > the community — Airflow users. I don’t think anyone in the community
> >> (apart
> >> > from the PMC members) would need raw data. And even if they need it,
> I’d
> >> > say they should put effort and contribute to the Airflow project and
> >> become
> >> > PMC members.
> >> >
> >> > To be clear: this telemetry data should help us, as Airflow PMC, to
> >> steer
> >> > some of the decision making based on this data similar to how only PMC
> >> has
> >> > a binding vote on the releases. [1] and this is similar to how Apache
> >> > Superset does it too.
> >> >
> >> > [1]
> >> > https://www.apache.org/dev/pmc.html#what-is-a-pmc
> >> >
> >> >
> >> >
> >> > On Wed, 3 Apr 2024 at 00:05, Hussein Awala <hu...@awala.fr> wrote:
> >> >
> >> > > I mentioned opting out just to confirm its importance, and after
> >> checking
> >> > > the Scarf documentation it appears to be supported natively by
> Scarf.
> >> For
> >> > > data accessibility, my point was more about raw data, not just
> >> aggregated
> >> > > information/insights shared via monthly newsletters, as we do for
> >> Airflow
> >> > > annual Survey for example:
> >> > > https://airflow.apache.org/survey vs
> >> > >
> >> > >
> >> >
> >>
> https://docs.google.com/forms/d/1wYm6c5Gn379zkg7zD7vcWB-1fCjnOocT0oZm-tjft_Q/viewanalytics
> >> > > .
> >> > >
> >> > > On Tue, Apr 2, 2024 at 2:43 PM Kaxil Naik <ka...@gmail.com>
> >> wrote:
> >> > >
> >> > > > Agreed to both your points Hussein but both the points are already
> >> > > covered
> >> > > > in my original discussion post - both about opting out and
> providing
> >> > data
> >> > > > to all the PMC members and providing visibility via Monthly
> >> > newsletters.
> >> > > Is
> >> > > > there anything else you propose to discuss that isn’t covered?
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Mon, 1 Apr 2024 at 13:21, Hussein Awala <hu...@awala.fr>
> >> wrote:
> >> > > >
> >> > > > > +1 for the idea in general, but there are two main points to
> >> discuss
> >> > > > before
> >> > > > > voting on this:
> >> > > > >
> >> > > > > 1. We should provide an option to disable Scarf:
> >> > > > > As Airflow is not a paid product, we cannot force companies to
> >> report
> >> > > > their
> >> > > > > use of this project. Otherwise, some may choose to create their
> >> own
> >> > > fork
> >> > > > > just to disable Scarf.
> >> > > > >
> >> > > > > 2. Concerning the exclusivity of access to data:
> >> > > > > The data collected must either be completely proprietary for use
> >> by
> >> > PMC
> >> > > > and
> >> > > > > ASF, or completely open. Since many companies offer Airflow as a
> >> > > product,
> >> > > > > it is imperative not to give one company more privileges than
> >> > others. I
> >> > > > > raise this point for the principle of equality of opportunity.
> >> > > > >
> >> > > > > On Mon, Apr 1, 2024 at 12:35 PM Ankit Chaurasia <
> >> sunank200@gmail.com
> >> > >
> >> > > > > wrote:
> >> > > > >
> >> > > > > > Big +1 for Scarf.
> >> > > > > >
> >> > > > > > Transparency is key, so it's important to be super clear about
> >> > opting
> >> > > > > > out and what's tracked to avoid spooking anyone about IP
> stuff.
> >> > > > > >
> >> > > > > > Regards
> >> > > > > > Ankit Chaurasia
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > > On Mon, Apr 1, 2024 at 10:18 AM Amogh Desai <
> >> > > amoghdesai.oss@gmail.com>
> >> > > > > > wrote:
> >> > > > > > >
> >> > > > > > > +1 looks like a good tool which could be super helpful.
> >> > > > > > >
> >> > > > > > > * We should have some transparency into the data that is
> >> > collected
> >> > > or
> >> > > > > > sent
> >> > > > > > > * We should have an option to optionally opt-out
> >> > > > > > >
> >> > > > > > > Thanks & Regards,
> >> > > > > > > Amogh Desai
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > On Sun, Mar 31, 2024 at 7:53 AM Wei Lee <
> weilee.rx@gmail.com>
> >> > > wrote:
> >> > > > > > >
> >> > > > > > > > +1 to this. It would be really useful. As long as we can
> opt
> >> > > out, I
> >> > > > > > think
> >> > > > > > > > we’re good.
> >> > > > > > > >
> >> > > > > > > > Best,
> >> > > > > > > > Wei
> >> > > > > > > >
> >> > > > > > > > > On Mar 31, 2024, at 12:47 AM, Kaxil Naik <
> >> > kaxilnaik@gmail.com>
> >> > > > > > wrote:
> >> > > > > > > > >
> >> > > > > > > > > Grammar Correction:
> >> > > > > > > > >
> >> > > > > > > > > We should assume that those who deploy and upgrade
> >> Airflow -
> >> > > > > actually
> >> > > > > > > > read
> >> > > > > > > > >> and take into account what is written in the release
> >> notes -
> >> > > > > > especially
> >> > > > > > > > if
> >> > > > > > > > >> they have security guys breathing their necks,
> similarly
> >> as
> >> > we
> >> > > > > have
> >> > > > > > to
> >> > > > > > > > >> assume they follow CVE announcements about security
> >> issues
> >> > > > fixed.
> >> > > > > > If we
> >> > > > > > > > >> are very straightforward and out-going about the
> change,
> >> > > inform
> >> > > > > very
> >> > > > > > > > >> clearly how to opt-out, I don't see a big problem with
> >> > > opt-out.
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > I couldn't agree more; even though we shouldn't collect
> >> any
> >> > > data
> >> > > > > that
> >> > > > > > > > > hamper security (and we should aim to do the same), most
> >> > > security
> >> > > > > > > > concerned
> >> > > > > > > > > folks don't just upgrade, and we can rely on them
> >> regarding
> >> > > > release
> >> > > > > > notes
> >> > > > > > > > > or announcements and we can make it very clear in our
> >> > > > announcements
> >> > > > > > too;
> >> > > > > > > > > and in our installation guides.
> >> > > > > > > > >
> >> > > > > > > > > On Sat, 30 Mar 2024 at 16:47, Kaxil Naik <
> >> > kaxilnaik@gmail.com>
> >> > > > > > wrote:
> >> > > > > > > > >
> >> > > > > > > > >> Grammar crrection:
> >> > > > > > > > >>
> >> > > > > > > > >>
> >> > > > > > > > >> On Sat, 30 Mar 2024 at 16:43, Kaxil Naik <
> >> > kaxilnaik@gmail.com
> >> > > >
> >> > > > > > wrote:
> >> > > > > > > > >>
> >> > > > > > > > >>> Have this at the end of the email too: but if folks
> >> don't
> >> > > read
> >> > > > > > until
> >> > > > > > > > the
> >> > > > > > > > >>> end and quoting Maxime from the use-case blog[1]:
> >> > > > > > > > >>>
> >> > > > > > > > >>> "I think people often ask ‘how do I contribute to open
> >> > > > source?’,
> >> > > > > > ‘I've
> >> > > > > > > > >>> got to get into the code’, or ‘ I’ve got to be an
> >> > engineer.’
> >> > > > > > Actually,
> >> > > > > > > > the
> >> > > > > > > > >>> very simplest thing that you can do is just say, ‘my
> >> > > > organization
> >> > > > > > gets
> >> > > > > > > > real
> >> > > > > > > > >>> value from this piece of software.’ There are a bunch
> of
> >> > ways
> >> > > > to
> >> > > > > > let
> >> > > > > > > > the
> >> > > > > > > > >>> people know about it – and now Scarf is there. If your
> >> > > > > > organization is
> >> > > > > > > > >>> getting a lot of value from a piece of open source
> >> > software,
> >> > > > make
> >> > > > > > sure
> >> > > > > > > > the
> >> > > > > > > > >>> devs know about it."
> >> > > > > > > > >>>
> >> > > > > > > > >>> What kind of edge cases are you thinking about? I
> don't
> >> > think
> >> > > > it
> >> > > > > > makes
> >> > > > > > > > >>> sense to have "opt-in" at all. As the goal is to
> collect
> >> > data
> >> > > > for
> >> > > > > > most
> >> > > > > > > > >>> Airflow installations except for those that don't want
> >> to
> >> > > give
> >> > > > > > data,
> >> > > > > > > > then
> >> > > > > > > > >>> "opt-out" is the only way to maximize it. As long as
> we
> >> > don't
> >> > > > > > collect
> >> > > > > > > > any
> >> > > > > > > > >>> PII data, this is in-compliance as well.
> >> > > > > > > > >>>
> >> > > > > > > > >>> Imagine someone learning Airflow, if they have to
> opt-in
> >> > via
> >> > > a
> >> > > > > > config,
> >> > > > > > > > >>> they wouldn't even know or care about it, hence us
> >> losing
> >> > > most
> >> > > > of
> >> > > > > > the
> >> > > > > > > > data.
> >> > > > > > > > >>> I understand why some orgs & individuals may want to
> >> > opt-out.
> >> > > > > > > > >>>
> >> > > > > > > > >>> Scarf Provides tracking pixels (essentially an HTML
> >> image
> >> > > tag)
> >> > > > > > that you
> >> > > > > > > > >>> can place in your website or product to track visitors
> >> to
> >> > > that
> >> > > > > > URL. If
> >> > > > > > > > >>> there were any concerns about Privacy, ASF wouldn't
> have
> >> > > > approved
> >> > > > > > it
> >> > > > > > > > at all.
> >> > > > > > > > >>>
> >> > > > > > > > >>> A few key details to note about the pixel:
> >> > > > > > > > >>>
> >> > > > > > > > >>>
> >> > > > > > > > >>>   - No PII is tracked… Scarf does not capture/retain
> IP
> >> > > > > > information…
> >> > > > > > > > >>>   this information is discarded by the platform upon
> >> > > > > > > > processing/aggregating
> >> > > > > > > > >>>   - Scarf pixels respect the Do Not Track (DNT)
> >> settings of
> >> > > > > > browsers -
> >> > > > > > > > >>>   these users will not be tracked whatsoever.
> >> > > > > > > > >>>
> >> > > > > > > > >>>
> >> > > > > > > > >>> All the ASF projects I had listed (whether they use
> >> Scarf
> >> > > > gateway
> >> > > > > > or
> >> > > > > > > > >>> Scarf pixel in product) are using opt-out.
> >> > > > > > > > >>>
> >> > > > > > > > >>> 1. Short opt-in period before opt-out. Test this
> feature
> >> > with
> >> > > > > > users who
> >> > > > > > > > >>>> trust and if it works great - make it public. I think
> >> it's
> >> > > > wise
> >> > > > > to
> >> > > > > > > > handle
> >> > > > > > > > >>>> edge cases and configure collected data more
> >> accurately.
> >> > > > > > > > >>>
> >> > > > > > > > >>>
> >> > > > > > > > >>>
> >> > > > > > > > >>> It would be a pixel in the webserver, should affect
> >> nothing
> >> > > at
> >> > > > > all
> >> > > > > > even
> >> > > > > > > > >>> in an air-gapped environment.
> >> > > > > > > > >>>
> >> > > > > > > > >>>> 2. It should not affect anything if access to the
> >> internet
> >> > > is
> >> > > > > > > > restricted
> >> > > > > > > > >>>> which is default for many companies.
> >> > > > > > > > >>>
> >> > > > > > > > >>>
> >> > > > > > > > >>>
> >> > > > > > > > >>> 100% agreed on the below:
> >> > > > > > > > >>>
> >> > > > > > > > >>>> I think we have a very good blueprint to follow
> >> including
> >> > at
> >> > > > > > least 5
> >> > > > > > > > >>>> other
> >> > > > > > > > >>>> ASF projects that also passed the review of the
> >> > privacy@asf.
> >> > > > > And
> >> > > > > > > > while I
> >> > > > > > > > >>>> understand (and concur) the urge for opt-in by
> default
> >> > > coming
> >> > > > > from
> >> > > > > > > > >>>> consumer
> >> > > > > > > > >>>> market (where it makes perfect sense) Airflow is not
> a
> >> > > > consumer
> >> > > > > > > > >>>> software and is used in "corporate environment" which
> >> has
> >> > a
> >> > > > > little
> >> > > > > > > > >>>> different expectations and broad assumption that the
> >> > company
> >> > > > can
> >> > > > > > make
> >> > > > > > > > >>>> decisions on such telemetry on behalf of the
> employees
> >> > using
> >> > > > it.
> >> > > > > > > > >>>
> >> > > > > > > > >>>
> >> > > > > > > > >>> Couldn't agree more; even though there shouldn't we
> >> collect
> >> > > > > hamper
> >> > > > > > > > >>> security (and we should aim to do the same), most
> >> security
> >> > > > > > concerned
> >> > > > > > > > folks
> >> > > > > > > > >>> don't just
> >> > > > > > > > >>> upgrade, and we can rely on them regarding release
> >> notes or
> >> > > > > > > > announcements
> >> > > > > > > > >>> and we can make it very clear in our announcements
> too;
> >> and
> >> > > in
> >> > > > > our
> >> > > > > > > > >>> installation guides.
> >> > > > > > > > >>>
> >> > > > > > > > >>> We should assume that those who deploy and upgrade
> >> Airflow
> >> > -
> >> > > > > > actually
> >> > > > > > > > read
> >> > > > > > > > >>>> and take into account what is written in the release
> >> > notes -
> >> > > > > > > > especially
> >> > > > > > > > >>>> if
> >> > > > > > > > >>>> they have security guys breathing their necks,
> >> similarly
> >> > as
> >> > > we
> >> > > > > > have to
> >> > > > > > > > >>>> assume they follow CVE announcements about security
> >> issues
> >> > > > > fixed.
> >> > > > > > If
> >> > > > > > > > we
> >> > > > > > > > >>>> are very straightforward and out-going about the
> >> change,
> >> > > > inform
> >> > > > > > very
> >> > > > > > > > >>>> clearly how to opt-out, I don't see a big problem
> with
> >> > > > opt-out.
> >> > > > > > > > >>>
> >> > > > > > > > >>>
> >> > > > > > > > >>>
> >> > > > > > > > >>> To be clear, the collection of data, or at least the
> >> data
> >> > we
> >> > > > > should
> >> > > > > > > > >>> gather here should help all the consumers without
> >> violating
> >> > > > > > anything
> >> > > > > > > > >>> regulations. I will quote Maxime's quote in the
> use-case
> >> > doc
> >> > > > [1]
> >> > > > > > > > >>>
> >> > > > > > > > >>> "*Another Form of Contributing*
> >> > > > > > > > >>> “I think people often ask ‘how do I contribute to open
> >> > > > source?’,
> >> > > > > > ‘I've
> >> > > > > > > > >>> got to get into the code’, or ‘ I’ve got to be an
> >> > engineer.’
> >> > > > > > Actually,
> >> > > > > > > > the
> >> > > > > > > > >>> very simplest thing that you can do is just say, ‘my
> >> > > > organization
> >> > > > > > gets
> >> > > > > > > > real
> >> > > > > > > > >>> value from this piece of software.’ There are a bunch
> of
> >> > ways
> >> > > > to
> >> > > > > > let
> >> > > > > > > > the
> >> > > > > > > > >>> people know about it – and now Scarf is there. If your
> >> > > > > > organization is
> >> > > > > > > > >>> getting a lot of value from a piece of open source
> >> > software,
> >> > > > make
> >> > > > > > sure
> >> > > > > > > > the
> >> > > > > > > > >>> devs know about it.”"
> >> > > > > > > > >>>
> >> > > > > > > > >>>
> >> > > > > > > > >>> [1]
> >> > > > https://about.scarf.sh/post/scarf-case-study-apache-superset
> >> > > > > > > > >>>
> >> > > > > > > > >>> On Sat, 30 Mar 2024 at 14:02, Alexander Shorin <
> >> > > > > kxepal@apache.org>
> >> > > > > > > > wrote:
> >> > > > > > > > >>>
> >> > > > > > > > >>>> Hi Jarek!
> >> > > > > > > > >>>>
> >> > > > > > > > >>>> I understand the reasons for opt-out from a project
> >> view.
> >> > I
> >> > > > just
> >> > > > > > > > suddenly
> >> > > > > > > > >>>> imagined the situation when an upgrade happens and
> here
> >> > > comes
> >> > > > > the
> >> > > > > > > > data to
> >> > > > > > > > >>>> some third party service - that's a view from a user
> >> side
> >> > of
> >> > > > > some
> >> > > > > > big
> >> > > > > > > > >>>> company.
> >> > > > > > > > >>>>
> >> > > > > > > > >>>> There could be good alternatives to handle this:
> >> > > > > > > > >>>> 1. Short opt-in period before opt-out. Test this
> >> feature
> >> > > with
> >> > > > > > users
> >> > > > > > > > who
> >> > > > > > > > >>>> trust and if it works great - make it public. I think
> >> it's
> >> > > > wise
> >> > > > > to
> >> > > > > > > > handle
> >> > > > > > > > >>>> edge cases and configure collected data more
> >> accurately.
> >> > > > > > > > >>>> 2. Explicitly somehow warn about this feature to make
> >> this
> >> > > > > > feature not
> >> > > > > > > > >>>> get
> >> > > > > > > > >>>> unnoticed. Just to reduce possible frustration.
> >> > > > > > > > >>>>
> >> > > > > > > > >>>> Just a personal thoughts for discussion (:
> >> > > > > > > > >>>>
> >> > > > > > > > >>>> --
> >> > > > > > > > >>>> ,,,^..^,,,
> >> > > > > > > > >>>>
> >> > > > > > > > >>>> On Sat, Mar 30, 2024 at 4:36 PM Jarek Potiuk <
> >> > > > jarek@potiuk.com>
> >> > > > > > > > wrote:
> >> > > > > > > > >>>>
> >> > > > > > > > >>>>> Hello everyone,
> >> > > > > > > > >>>>>
> >> > > > > > > > >>>>> it has to be:
> >> > > > > > > > >>>>>
> >> > > > > > > > >>>>> 1. Opt-in by default to not trigger security guys
> >> about
> >> > new
> >> > > > > > unplanned
> >> > > > > > > > >>>>>> activity after regular upgrade.
> >> > > > > > > > >>>>>>
> >> > > > > > > > >>>>>
> >> > > > > > > > >>>>> That's a very good point about security triggering
> >> > > Alexander,
> >> > > > > > but I
> >> > > > > > > > am
> >> > > > > > > > >>>> not
> >> > > > > > > > >>>>> so sure it means that we "have to" do opt-in. There
> >> are
> >> > > other
> >> > > > > > ways of
> >> > > > > > > > >>>>> communicating with the "deployment managers" who
> >> install
> >> > > and
> >> > > > > > upgrade
> >> > > > > > > > >>>>> airflow - i.e. release notes. blogs, social media of
> >> > ours,
> >> > > > > slack
> >> > > > > > > > >>>>> announcements etc. We have plenty of channels we can
> >> use
> >> > to
> >> > > > > > > > >>>> communicate the
> >> > > > > > > > >>>>> change.
> >> > > > > > > > >>>>>
> >> > > > > > > > >>>>> I think we have a very good blueprint to follow
> >> including
> >> > > at
> >> > > > > > least 5
> >> > > > > > > > >>>> other
> >> > > > > > > > >>>>> ASF projects that also passed the review of the
> >> > > privacy@asf.
> >> > > > > And
> >> > > > > > > > >>>> while I
> >> > > > > > > > >>>>> understand (and concur) the urge for opt-in by
> default
> >> > > coming
> >> > > > > > from
> >> > > > > > > > >>>> consumer
> >> > > > > > > > >>>>> market (where it makes perfect sense) Airflow is
> not a
> >> > > > consumer
> >> > > > > > > > >>>>> software and is used in "corporate environment"
> which
> >> > has a
> >> > > > > > little
> >> > > > > > > > >>>>> different expectations and broad assumption that the
> >> > > company
> >> > > > > can
> >> > > > > > make
> >> > > > > > > > >>>>> decisions on such telemetry on behalf of the
> employees
> >> > > using
> >> > > > > it.
> >> > > > > > > > >>>>>
> >> > > > > > > > >>>>> We should assume that those who deploy and upgrade
> >> > Airflow
> >> > > -
> >> > > > > > actually
> >> > > > > > > > >>>> read
> >> > > > > > > > >>>>> and take into account what is written in the release
> >> > notes
> >> > > -
> >> > > > > > > > >>>> especially if
> >> > > > > > > > >>>>> they have security guys breathing their necks,
> >> similarly
> >> > as
> >> > > > we
> >> > > > > > have
> >> > > > > > > > to
> >> > > > > > > > >>>>> assume they follow CVE announcements about security
> >> > issues
> >> > > > > > fixed. If
> >> > > > > > > > we
> >> > > > > > > > >>>>> are very straightforward and out-going about the
> >> change,
> >> > > > inform
> >> > > > > > very
> >> > > > > > > > >>>>> clearly how to opt-out, I don't see a big problem
> with
> >> > > > opt-out.
> >> > > > > > > > >>>>>
> >> > > > > > > > >>>>> We should of course check with privacy@a.o (but I'v
> >> > spend
> >> > > a
> >> > > > > good
> >> > > > > > > > deal
> >> > > > > > > > >>>> of
> >> > > > > > > > >>>>> time reading the Superset  and other use case and
> >> > > explanation
> >> > > > > in
> >> > > > > > > > >>>> detail to
> >> > > > > > > > >>>>> make a better informed decision) - and it looks like
> >> they
> >> > > > also
> >> > > > > > went
> >> > > > > > > > >>>> opt-out
> >> > > > > > > > >>>>> way and got cleared by privacy@a.o.  And if we
> cannot
> >> > > reach
> >> > > > > > > > >>>> consensus, we
> >> > > > > > > > >>>>> should - as usual - make a voting decision on it
> >> (because
> >> > > > yes,
> >> > > > > > it is
> >> > > > > > > > an
> >> > > > > > > > >>>>> important decision), but - after reading and
> >> > understanding
> >> > > > why
> >> > > > > > others
> >> > > > > > > > >>>> also
> >> > > > > > > > >>>>> did it - for me personally, opt-out is a good path.
> >> > > > > > > > >>>>>
> >> > > > > > > > >>>>> Also because it will rather increase the amount of
> >> data
> >> > to
> >> > > > > > gather,
> >> > > > > > > > and
> >> > > > > > > > >>>> in
> >> > > > > > > > >>>>> our case - counter intuitively - it will be even
> >> better
> >> > for
> >> > > > > > privacy
> >> > > > > > > > and
> >> > > > > > > > >>>>> corporate anonymity, because the more data we get,
> the
> >> > more
> >> > > > > > difficult
> >> > > > > > > > >>>> it
> >> > > > > > > > >>>>> will be to get any non-statistical/non-aggregated
> >> insight
> >> > > > from
> >> > > > > > it.
> >> > > > > > > > >>>> Imagine
> >> > > > > > > > >>>>> if only a few corporate users will enable it
> >> consciously
> >> > -
> >> > > > then
> >> > > > > > we
> >> > > > > > > > >>>> will be
> >> > > > > > > > >>>>> able to draw much more conclusions if we find out
> who
> >> > they
> >> > > > are,
> >> > > > > > than
> >> > > > > > > > if
> >> > > > > > > > >>>>> everyone has it enabled by default.
> >> > > > > > > > >>>>>
> >> > > > > > > > >>>>> That's my take on it - but again, it's up to us to
> >> vote,
> >> > > for
> >> > > > me
> >> > > > > > > > opt-in
> >> > > > > > > > >>>> is
> >> > > > > > > > >>>>> not "has to", and I am rather for opt-out.
> >> > > > > > > > >>>>>
> >> > > > > > > > >>>>> J.
> >> > > > > > > > >>>>>
> >> > > > > > > > >>>>>> Hi all,
> >> > > > > > > > >>>>>>
> >> > > > > > > > >>>>>>
> >> > > > > > > > >>>>>>> I want to propose gathering telemetry for Airflow
> >> > > > > > installations.
> >> > > > > > > > >>>> As the
> >> > > > > > > > >>>>>>> Airflow community, we have been relying heavily on
> >> the
> >> > > > yearly
> >> > > > > > > > >>>> Airflow
> >> > > > > > > > >>>>>>> Survey and anecdotes to answer a few key questions
> >> > about
> >> > > > > > Airflow
> >> > > > > > > > >>>> usage.
> >> > > > > > > > >>>>>>> Questions like the following:
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>>   - Which versions of Airflow are people
> >> > installing/using
> >> > > > now
> >> > > > > > > > >>>> (i.e.
> >> > > > > > > > >>>>>>>   whether people have primarily made the jump from
> >> > > version
> >> > > > X
> >> > > > > to
> >> > > > > > > > >>>>> version
> >> > > > > > > > >>>>>> Y)
> >> > > > > > > > >>>>>>>   - Which DB is used as the Metadata DB and which
> >> > version
> >> > > > e.g
> >> > > > > > Pg
> >> > > > > > > > >>>> 14?
> >> > > > > > > > >>>>>>>   - What Python version is being used?
> >> > > > > > > > >>>>>>>   - Which Executor is being used?
> >> > > > > > > > >>>>>>>   - Approximately how many people out there in the
> >> > world
> >> > > > are
> >> > > > > > > > >>>>> installing
> >> > > > > > > > >>>>>>>   Airflow
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>> There is a solution that should help answer these
> >> > > > questions:
> >> > > > > > Scarf
> >> > > > > > > > >>>> [1].
> >> > > > > > > > >>>>>> The
> >> > > > > > > > >>>>>>> ASF already approves Scarf [2][3] and is already
> >> used
> >> > by
> >> > > > > other
> >> > > > > > ASF
> >> > > > > > > > >>>>>>> projects: Superset [4], Dolphin Scheduler [5],
> Dubbo
> >> > > > > > Kubernetes,
> >> > > > > > > > >>>>> DevLake,
> >> > > > > > > > >>>>>>> Skywalking as it follows GDPR and other
> regulations.
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>> Similar to Superset, we probably can use it as
> >> follows:
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>>   1. Install the `scarf js` npm package and bundle
> >> it
> >> > in
> >> > > > the
> >> > > > > > > > >>>>> Webserver.
> >> > > > > > > > >>>>>>>   When the package is downloaded & Airflow
> >> webserver is
> >> > > > > opened,
> >> > > > > > > > >>>>> metadata
> >> > > > > > > > >>>>>>> is
> >> > > > > > > > >>>>>>>   recorded to the Scarf dashboard.
> >> > > > > > > > >>>>>>>   2. Utilize the Scarf Gateway [6], which we can
> >> use in
> >> > > > front
> >> > > > > > of
> >> > > > > > > > >>>>> docker
> >> > > > > > > > >>>>>>>   containers. While it’s possible people go around
> >> this
> >> > > > > > gateway,
> >> > > > > > > > >>>> we
> >> > > > > > > > >>>>> can
> >> > > > > > > > >>>>>>>   probably configure and encourage most traffic to
> >> go
> >> > > > through
> >> > > > > > > > >>>> these
> >> > > > > > > > >>>>>>> gateways.
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>> While Scarf does not store any personally
> >> identifying
> >> > > > > > information
> >> > > > > > > > >>>> from
> >> > > > > > > > >>>>>> SDK
> >> > > > > > > > >>>>>>> telemetry data, it does send various bits of
> >> IP-derived
> >> > > > > > > > >>>> information as
> >> > > > > > > > >>>>>>> outlined here [7]. This data should be made as
> >> > > transparent
> >> > > > as
> >> > > > > > > > >>>> possible
> >> > > > > > > > >>>>> by
> >> > > > > > > > >>>>>>> granting dashboard access to the Airflow PMC and
> any
> >> > > other
> >> > > > > > relevant
> >> > > > > > > > >>>>> means
> >> > > > > > > > >>>>>>> of sharing/surfacing it that we encounter (Town
> >> Hall,
> >> > > > Slack,
> >> > > > > > > > >>>> Newsletter
> >> > > > > > > > >>>>>>> etc).
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>> The following case studies are worth reading:
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>>   1.
> >> > > > > > https://about.scarf.sh/post/scarf-case-study-apache-superset
> >> > > > > > > > >>>>> (From
> >> > > > > > > > >>>>>>>   Maxime)
> >> > > > > > > > >>>>>>>   2.
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>
> >> > > > > > > > >>>>>
> >> > > > > > > > >>>>
> >> > > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://about.scarf.sh/post/haskell-org-bridging-the-gap-between-language-innovation-and-community-understanding
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>> Similar to them, this could help in various ways
> >> that
> >> > > come
> >> > > > > with
> >> > > > > > > > >>>> using
> >> > > > > > > > >>>>>> data
> >> > > > > > > > >>>>>>> for decision-making. With clear guidelines on "how
> >> to
> >> > > > > opt-out"
> >> > > > > > > > >>>>>> [8][9][10] &
> >> > > > > > > > >>>>>>> "what data is being collected" on the Airflow
> >> website,
> >> > > this
> >> > > > > > can be
> >> > > > > > > > >>>>>>> beneficial to the entire community as we would be
> >> > making
> >> > > > more
> >> > > > > > > > >>>> informed
> >> > > > > > > > >>>>>>> decisions.
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>> Regards,
> >> > > > > > > > >>>>>>> Kaxil
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>> [1] https://about.scarf.sh/
> >> > > > > > > > >>>>>>> [2]
> >> > > > > >
> https://privacy.apache.org/policies/privacy-policy-public.html
> >> > > > > > > > >>>>>>> [3]
> https://privacy.apache.org/faq/committers.html
> >> > > > > > > > >>>>>>> [4]
> https://github.com/apache/superset/issues/25639
> >> > > > > > > > >>>>>>> [5]
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>
> >> > > > > > > > >>>>>
> >> > > > > > > > >>>>
> >> > > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/search?q=repo%3Aapache%2Fdolphinscheduler%20scarf.sh&type=code
> >> > > > > > > > >>>>>>> [6] https://about.scarf.sh/scarf-gateway
> >> > > > > > > > >>>>>>> [7] https://about.scarf.sh/privacy-policy
> >> > > > > > > > >>>>>>> [8]
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>
> >> > > > > > > > >>>>>
> >> > > > > > > > >>>>
> >> > > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://superset.apache.org/docs/frequently-asked-questions/#does-superset-collect-any-telemetry-data
> >> > > > > > > > >>>>>>> [9]
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>
> >> > > > > > > > >>>>>
> >> > > > > > > > >>>>
> >> > > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://superset.apache.org/docs/installation/installing-superset-using-docker-compose
> >> > > > > > > > >>>>>>> [10]
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>
> >> > > > > > > > >>>>>
> >> > > > > > > > >>>>
> >> > > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics
> >> > > > > > > > >>>>>>>
> >> > > > > > > > >>>>>>
> >> > > > > > > > >>>>>
> >> > > > > > > > >>>>
> >> > > > > > > > >>>
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > >
> >> ---------------------------------------------------------------------
> >> > > > > > > > To unsubscribe, e-mail:
> dev-unsubscribe@airflow.apache.org
> >> > > > > > > > For additional commands, e-mail:
> >> dev-help@airflow.apache.org
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > >
> >> > > > > >
> >> > ---------------------------------------------------------------------
> >> > > > > > To unsubscribe, e-mail: dev-unsubscribe@airflow.apache.org
> >> > > > > > For additional commands, e-mail: dev-help@airflow.apache.org
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>

Re: [DISCUSS] Proposal for adding Telemetry via Scarf

Posted by Hussein Awala <hu...@awala.fr>.
> I'd like to propose, that we start with collecting simple data with
limited access: to all the PMC members. We can always expand it to
Committers and then expand further to make it invite-only or setup
exporting it to a DB like Postgres
<https://github.com/scarf-sh/scarf-postgres-exporter> and have a publicly
viewable dashboard.

Looks like a good plan; we can discuss the export format when we decide to
do it.

On Wed, Apr 3, 2024 at 7:59 PM Kaxil Naik <ka...@gmail.com> wrote:

> Yup, exactly.
>
> I believe this would definitely help us take early and informed decisions.
>> E.g. Had we had this earlier, I believe it would have definitely helped us
>> more for our past discussions like whether we should continue supporting
>> MsSQL(https://lists.apache.org/thread/r06j306hldg03g2my1pd4nyjxg78b3h4),
>> similarly about the DaskExecutor (
>> https://lists.apache.org/thread/ptwjf5g87lyl5476krt91bzfrm96pnb1), etc.
>>
>
>
> Btw clarifying my own stance on the below; and let me know what you think @Hussein
> Awala <hu...@awala.fr> : I'd like to propose, that we start with
> collecting simple data with limited access: to all the PMC members. We can
> always expand it to Committers and then expand further to make it
> invite-only or setup exporting it to a DB like Postgres
> <https://github.com/scarf-sh/scarf-postgres-exporter> and have a publicly
> viewable dashboard. It would be similar to an iterative software
> development approach, since this will be the first time for us, as Airflow
> PMC, to add such telemetry. This is of course just my opinion though :)
>
> Regarding the data, like I had mentioned in the email and I am glad others
>> including you are on the same page that the data will be shared with all
>> PMC members. The point about sharing it via website and newsletter was for
>> the community — Airflow users. I don’t think anyone in the community (apart
>> from the PMC members) would need raw data. And even if they need it, I’d
>> say they should put effort and contribute to the Airflow project and become
>> PMC members.
>> To be clear: this telemetry data should help us, as Airflow PMC, to steer
>> some of the decision making based on this data similar to how only PMC has
>> a binding vote on the releases. [1] and this is similar to how Apache
>> Superset does it too.
>> [1]
>> https://www.apache.org/dev/pmc.html#what-is-a-pmc
>
>
> On Wed, 3 Apr 2024 at 12:03, Pankaj Koti <pa...@astronomer.io.invalid>
> wrote:
>
>> +1 to introduce this.
>>
>> I believe this would definitely help us take early and informed decisions.
>> E.g. Had we had this earlier, I believe it would have definitely helped us
>> more for our past discussions like whether we should continue supporting
>> MsSQL(https://lists.apache.org/thread/r06j306hldg03g2my1pd4nyjxg78b3h4),
>> similarly about the DaskExecutor (
>> https://lists.apache.org/thread/ptwjf5g87lyl5476krt91bzfrm96pnb1), etc.
>>
>>
>> Best regards,
>>
>> *Pankaj Koti*
>> Senior Software Engineer (Airflow OSS Engineering team)
>> Location: Pune, Maharashtra, India
>> Timezone: Indian Standard Time (IST)
>> Phone: +91 9730079985
>>
>>
>> On Wed, Apr 3, 2024 at 2:44 PM Kaxil Naik <ka...@gmail.com> wrote:
>>
>> > Yup, I had added a link to scarf docs in the original email that
>> referenced
>> > opting out and we should even add an Airflow config that puts all
>> config in
>> > a single place. Without it we can’t be compliant to all the policies
>> even
>> > if we collectively ignore or are unaware of the importance of it.
>> >
>> > Regarding the data, like I had mentioned in the email and I am glad
>> others
>> > including you are on the same page that the data will be shared with all
>> > PMC members. The point about sharing it via website and newsletter was
>> for
>> > the community — Airflow users. I don’t think anyone in the community
>> (apart
>> > from the PMC members) would need raw data. And even if they need it, I’d
>> > say they should put effort and contribute to the Airflow project and
>> become
>> > PMC members.
>> >
>> > To be clear: this telemetry data should help us, as Airflow PMC, to
>> steer
>> > some of the decision making based on this data similar to how only PMC
>> has
>> > a binding vote on the releases. [1] and this is similar to how Apache
>> > Superset does it too.
>> >
>> > [1]
>> > https://www.apache.org/dev/pmc.html#what-is-a-pmc
>> >
>> >
>> >
>> > On Wed, 3 Apr 2024 at 00:05, Hussein Awala <hu...@awala.fr> wrote:
>> >
>> > > I mentioned opting out just to confirm its importance, and after
>> checking
>> > > the Scarf documentation it appears to be supported natively by Scarf.
>> For
>> > > data accessibility, my point was more about raw data, not just
>> aggregated
>> > > information/insights shared via monthly newsletters, as we do for
>> Airflow
>> > > annual Survey for example:
>> > > https://airflow.apache.org/survey vs
>> > >
>> > >
>> >
>> https://docs.google.com/forms/d/1wYm6c5Gn379zkg7zD7vcWB-1fCjnOocT0oZm-tjft_Q/viewanalytics
>> > > .
>> > >
>> > > On Tue, Apr 2, 2024 at 2:43 PM Kaxil Naik <ka...@gmail.com>
>> wrote:
>> > >
>> > > > Agreed to both your points Hussein but both the points are already
>> > > covered
>> > > > in my original discussion post - both about opting out and providing
>> > data
>> > > > to all the PMC members and providing visibility via Monthly
>> > newsletters.
>> > > Is
>> > > > there anything else you propose to discuss that isn’t covered?
>> > > >
>> > > >
>> > > >
>> > > > On Mon, 1 Apr 2024 at 13:21, Hussein Awala <hu...@awala.fr>
>> wrote:
>> > > >
>> > > > > +1 for the idea in general, but there are two main points to
>> discuss
>> > > > before
>> > > > > voting on this:
>> > > > >
>> > > > > 1. We should provide an option to disable Scarf:
>> > > > > As Airflow is not a paid product, we cannot force companies to
>> report
>> > > > their
>> > > > > use of this project. Otherwise, some may choose to create their
>> own
>> > > fork
>> > > > > just to disable Scarf.
>> > > > >
>> > > > > 2. Concerning the exclusivity of access to data:
>> > > > > The data collected must either be completely proprietary for use
>> by
>> > PMC
>> > > > and
>> > > > > ASF, or completely open. Since many companies offer Airflow as a
>> > > product,
>> > > > > it is imperative not to give one company more privileges than
>> > others. I
>> > > > > raise this point for the principle of equality of opportunity.
>> > > > >
>> > > > > On Mon, Apr 1, 2024 at 12:35 PM Ankit Chaurasia <
>> sunank200@gmail.com
>> > >
>> > > > > wrote:
>> > > > >
>> > > > > > Big +1 for Scarf.
>> > > > > >
>> > > > > > Transparency is key, so it's important to be super clear about
>> > opting
>> > > > > > out and what's tracked to avoid spooking anyone about IP stuff.
>> > > > > >
>> > > > > > Regards
>> > > > > > Ankit Chaurasia
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > On Mon, Apr 1, 2024 at 10:18 AM Amogh Desai <
>> > > amoghdesai.oss@gmail.com>
>> > > > > > wrote:
>> > > > > > >
>> > > > > > > +1 looks like a good tool which could be super helpful.
>> > > > > > >
>> > > > > > > * We should have some transparency into the data that is
>> > collected
>> > > or
>> > > > > > sent
>> > > > > > > * We should have an option to optionally opt-out
>> > > > > > >
>> > > > > > > Thanks & Regards,
>> > > > > > > Amogh Desai
>> > > > > > >
>> > > > > > >
>> > > > > > > On Sun, Mar 31, 2024 at 7:53 AM Wei Lee <we...@gmail.com>
>> > > wrote:
>> > > > > > >
>> > > > > > > > +1 to this. It would be really useful. As long as we can opt
>> > > out, I
>> > > > > > think
>> > > > > > > > we’re good.
>> > > > > > > >
>> > > > > > > > Best,
>> > > > > > > > Wei
>> > > > > > > >
>> > > > > > > > > On Mar 31, 2024, at 12:47 AM, Kaxil Naik <
>> > kaxilnaik@gmail.com>
>> > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > Grammar Correction:
>> > > > > > > > >
>> > > > > > > > > We should assume that those who deploy and upgrade
>> Airflow -
>> > > > > actually
>> > > > > > > > read
>> > > > > > > > >> and take into account what is written in the release
>> notes -
>> > > > > > especially
>> > > > > > > > if
>> > > > > > > > >> they have security guys breathing their necks, similarly
>> as
>> > we
>> > > > > have
>> > > > > > to
>> > > > > > > > >> assume they follow CVE announcements about security
>> issues
>> > > > fixed.
>> > > > > > If we
>> > > > > > > > >> are very straightforward and out-going about the change,
>> > > inform
>> > > > > very
>> > > > > > > > >> clearly how to opt-out, I don't see a big problem with
>> > > opt-out.
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > I couldn't agree more; even though we shouldn't collect
>> any
>> > > data
>> > > > > that
>> > > > > > > > > hamper security (and we should aim to do the same), most
>> > > security
>> > > > > > > > concerned
>> > > > > > > > > folks don't just upgrade, and we can rely on them
>> regarding
>> > > > release
>> > > > > > notes
>> > > > > > > > > or announcements and we can make it very clear in our
>> > > > announcements
>> > > > > > too;
>> > > > > > > > > and in our installation guides.
>> > > > > > > > >
>> > > > > > > > > On Sat, 30 Mar 2024 at 16:47, Kaxil Naik <
>> > kaxilnaik@gmail.com>
>> > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > >> Grammar crrection:
>> > > > > > > > >>
>> > > > > > > > >>
>> > > > > > > > >> On Sat, 30 Mar 2024 at 16:43, Kaxil Naik <
>> > kaxilnaik@gmail.com
>> > > >
>> > > > > > wrote:
>> > > > > > > > >>
>> > > > > > > > >>> Have this at the end of the email too: but if folks
>> don't
>> > > read
>> > > > > > until
>> > > > > > > > the
>> > > > > > > > >>> end and quoting Maxime from the use-case blog[1]:
>> > > > > > > > >>>
>> > > > > > > > >>> "I think people often ask ‘how do I contribute to open
>> > > > source?’,
>> > > > > > ‘I've
>> > > > > > > > >>> got to get into the code’, or ‘ I’ve got to be an
>> > engineer.’
>> > > > > > Actually,
>> > > > > > > > the
>> > > > > > > > >>> very simplest thing that you can do is just say, ‘my
>> > > > organization
>> > > > > > gets
>> > > > > > > > real
>> > > > > > > > >>> value from this piece of software.’ There are a bunch of
>> > ways
>> > > > to
>> > > > > > let
>> > > > > > > > the
>> > > > > > > > >>> people know about it – and now Scarf is there. If your
>> > > > > > organization is
>> > > > > > > > >>> getting a lot of value from a piece of open source
>> > software,
>> > > > make
>> > > > > > sure
>> > > > > > > > the
>> > > > > > > > >>> devs know about it."
>> > > > > > > > >>>
>> > > > > > > > >>> What kind of edge cases are you thinking about? I don't
>> > think
>> > > > it
>> > > > > > makes
>> > > > > > > > >>> sense to have "opt-in" at all. As the goal is to collect
>> > data
>> > > > for
>> > > > > > most
>> > > > > > > > >>> Airflow installations except for those that don't want
>> to
>> > > give
>> > > > > > data,
>> > > > > > > > then
>> > > > > > > > >>> "opt-out" is the only way to maximize it. As long as we
>> > don't
>> > > > > > collect
>> > > > > > > > any
>> > > > > > > > >>> PII data, this is in-compliance as well.
>> > > > > > > > >>>
>> > > > > > > > >>> Imagine someone learning Airflow, if they have to opt-in
>> > via
>> > > a
>> > > > > > config,
>> > > > > > > > >>> they wouldn't even know or care about it, hence us
>> losing
>> > > most
>> > > > of
>> > > > > > the
>> > > > > > > > data.
>> > > > > > > > >>> I understand why some orgs & individuals may want to
>> > opt-out.
>> > > > > > > > >>>
>> > > > > > > > >>> Scarf Provides tracking pixels (essentially an HTML
>> image
>> > > tag)
>> > > > > > that you
>> > > > > > > > >>> can place in your website or product to track visitors
>> to
>> > > that
>> > > > > > URL. If
>> > > > > > > > >>> there were any concerns about Privacy, ASF wouldn't have
>> > > > approved
>> > > > > > it
>> > > > > > > > at all.
>> > > > > > > > >>>
>> > > > > > > > >>> A few key details to note about the pixel:
>> > > > > > > > >>>
>> > > > > > > > >>>
>> > > > > > > > >>>   - No PII is tracked… Scarf does not capture/retain IP
>> > > > > > information…
>> > > > > > > > >>>   this information is discarded by the platform upon
>> > > > > > > > processing/aggregating
>> > > > > > > > >>>   - Scarf pixels respect the Do Not Track (DNT)
>> settings of
>> > > > > > browsers -
>> > > > > > > > >>>   these users will not be tracked whatsoever.
>> > > > > > > > >>>
>> > > > > > > > >>>
>> > > > > > > > >>> All the ASF projects I had listed (whether they use
>> Scarf
>> > > > gateway
>> > > > > > or
>> > > > > > > > >>> Scarf pixel in product) are using opt-out.
>> > > > > > > > >>>
>> > > > > > > > >>> 1. Short opt-in period before opt-out. Test this feature
>> > with
>> > > > > > users who
>> > > > > > > > >>>> trust and if it works great - make it public. I think
>> it's
>> > > > wise
>> > > > > to
>> > > > > > > > handle
>> > > > > > > > >>>> edge cases and configure collected data more
>> accurately.
>> > > > > > > > >>>
>> > > > > > > > >>>
>> > > > > > > > >>>
>> > > > > > > > >>> It would be a pixel in the webserver, should affect
>> nothing
>> > > at
>> > > > > all
>> > > > > > even
>> > > > > > > > >>> in an air-gapped environment.
>> > > > > > > > >>>
>> > > > > > > > >>>> 2. It should not affect anything if access to the
>> internet
>> > > is
>> > > > > > > > restricted
>> > > > > > > > >>>> which is default for many companies.
>> > > > > > > > >>>
>> > > > > > > > >>>
>> > > > > > > > >>>
>> > > > > > > > >>> 100% agreed on the below:
>> > > > > > > > >>>
>> > > > > > > > >>>> I think we have a very good blueprint to follow
>> including
>> > at
>> > > > > > least 5
>> > > > > > > > >>>> other
>> > > > > > > > >>>> ASF projects that also passed the review of the
>> > privacy@asf.
>> > > > > And
>> > > > > > > > while I
>> > > > > > > > >>>> understand (and concur) the urge for opt-in by default
>> > > coming
>> > > > > from
>> > > > > > > > >>>> consumer
>> > > > > > > > >>>> market (where it makes perfect sense) Airflow is not a
>> > > > consumer
>> > > > > > > > >>>> software and is used in "corporate environment" which
>> has
>> > a
>> > > > > little
>> > > > > > > > >>>> different expectations and broad assumption that the
>> > company
>> > > > can
>> > > > > > make
>> > > > > > > > >>>> decisions on such telemetry on behalf of the employees
>> > using
>> > > > it.
>> > > > > > > > >>>
>> > > > > > > > >>>
>> > > > > > > > >>> Couldn't agree more; even though there shouldn't we
>> collect
>> > > > > hamper
>> > > > > > > > >>> security (and we should aim to do the same), most
>> security
>> > > > > > concerned
>> > > > > > > > folks
>> > > > > > > > >>> don't just
>> > > > > > > > >>> upgrade, and we can rely on them regarding release
>> notes or
>> > > > > > > > announcements
>> > > > > > > > >>> and we can make it very clear in our announcements too;
>> and
>> > > in
>> > > > > our
>> > > > > > > > >>> installation guides.
>> > > > > > > > >>>
>> > > > > > > > >>> We should assume that those who deploy and upgrade
>> Airflow
>> > -
>> > > > > > actually
>> > > > > > > > read
>> > > > > > > > >>>> and take into account what is written in the release
>> > notes -
>> > > > > > > > especially
>> > > > > > > > >>>> if
>> > > > > > > > >>>> they have security guys breathing their necks,
>> similarly
>> > as
>> > > we
>> > > > > > have to
>> > > > > > > > >>>> assume they follow CVE announcements about security
>> issues
>> > > > > fixed.
>> > > > > > If
>> > > > > > > > we
>> > > > > > > > >>>> are very straightforward and out-going about the
>> change,
>> > > > inform
>> > > > > > very
>> > > > > > > > >>>> clearly how to opt-out, I don't see a big problem with
>> > > > opt-out.
>> > > > > > > > >>>
>> > > > > > > > >>>
>> > > > > > > > >>>
>> > > > > > > > >>> To be clear, the collection of data, or at least the
>> data
>> > we
>> > > > > should
>> > > > > > > > >>> gather here should help all the consumers without
>> violating
>> > > > > > anything
>> > > > > > > > >>> regulations. I will quote Maxime's quote in the use-case
>> > doc
>> > > > [1]
>> > > > > > > > >>>
>> > > > > > > > >>> "*Another Form of Contributing*
>> > > > > > > > >>> “I think people often ask ‘how do I contribute to open
>> > > > source?’,
>> > > > > > ‘I've
>> > > > > > > > >>> got to get into the code’, or ‘ I’ve got to be an
>> > engineer.’
>> > > > > > Actually,
>> > > > > > > > the
>> > > > > > > > >>> very simplest thing that you can do is just say, ‘my
>> > > > organization
>> > > > > > gets
>> > > > > > > > real
>> > > > > > > > >>> value from this piece of software.’ There are a bunch of
>> > ways
>> > > > to
>> > > > > > let
>> > > > > > > > the
>> > > > > > > > >>> people know about it – and now Scarf is there. If your
>> > > > > > organization is
>> > > > > > > > >>> getting a lot of value from a piece of open source
>> > software,
>> > > > make
>> > > > > > sure
>> > > > > > > > the
>> > > > > > > > >>> devs know about it.”"
>> > > > > > > > >>>
>> > > > > > > > >>>
>> > > > > > > > >>> [1]
>> > > > https://about.scarf.sh/post/scarf-case-study-apache-superset
>> > > > > > > > >>>
>> > > > > > > > >>> On Sat, 30 Mar 2024 at 14:02, Alexander Shorin <
>> > > > > kxepal@apache.org>
>> > > > > > > > wrote:
>> > > > > > > > >>>
>> > > > > > > > >>>> Hi Jarek!
>> > > > > > > > >>>>
>> > > > > > > > >>>> I understand the reasons for opt-out from a project
>> view.
>> > I
>> > > > just
>> > > > > > > > suddenly
>> > > > > > > > >>>> imagined the situation when an upgrade happens and here
>> > > comes
>> > > > > the
>> > > > > > > > data to
>> > > > > > > > >>>> some third party service - that's a view from a user
>> side
>> > of
>> > > > > some
>> > > > > > big
>> > > > > > > > >>>> company.
>> > > > > > > > >>>>
>> > > > > > > > >>>> There could be good alternatives to handle this:
>> > > > > > > > >>>> 1. Short opt-in period before opt-out. Test this
>> feature
>> > > with
>> > > > > > users
>> > > > > > > > who
>> > > > > > > > >>>> trust and if it works great - make it public. I think
>> it's
>> > > > wise
>> > > > > to
>> > > > > > > > handle
>> > > > > > > > >>>> edge cases and configure collected data more
>> accurately.
>> > > > > > > > >>>> 2. Explicitly somehow warn about this feature to make
>> this
>> > > > > > feature not
>> > > > > > > > >>>> get
>> > > > > > > > >>>> unnoticed. Just to reduce possible frustration.
>> > > > > > > > >>>>
>> > > > > > > > >>>> Just a personal thoughts for discussion (:
>> > > > > > > > >>>>
>> > > > > > > > >>>> --
>> > > > > > > > >>>> ,,,^..^,,,
>> > > > > > > > >>>>
>> > > > > > > > >>>> On Sat, Mar 30, 2024 at 4:36 PM Jarek Potiuk <
>> > > > jarek@potiuk.com>
>> > > > > > > > wrote:
>> > > > > > > > >>>>
>> > > > > > > > >>>>> Hello everyone,
>> > > > > > > > >>>>>
>> > > > > > > > >>>>> it has to be:
>> > > > > > > > >>>>>
>> > > > > > > > >>>>> 1. Opt-in by default to not trigger security guys
>> about
>> > new
>> > > > > > unplanned
>> > > > > > > > >>>>>> activity after regular upgrade.
>> > > > > > > > >>>>>>
>> > > > > > > > >>>>>
>> > > > > > > > >>>>> That's a very good point about security triggering
>> > > Alexander,
>> > > > > > but I
>> > > > > > > > am
>> > > > > > > > >>>> not
>> > > > > > > > >>>>> so sure it means that we "have to" do opt-in. There
>> are
>> > > other
>> > > > > > ways of
>> > > > > > > > >>>>> communicating with the "deployment managers" who
>> install
>> > > and
>> > > > > > upgrade
>> > > > > > > > >>>>> airflow - i.e. release notes. blogs, social media of
>> > ours,
>> > > > > slack
>> > > > > > > > >>>>> announcements etc. We have plenty of channels we can
>> use
>> > to
>> > > > > > > > >>>> communicate the
>> > > > > > > > >>>>> change.
>> > > > > > > > >>>>>
>> > > > > > > > >>>>> I think we have a very good blueprint to follow
>> including
>> > > at
>> > > > > > least 5
>> > > > > > > > >>>> other
>> > > > > > > > >>>>> ASF projects that also passed the review of the
>> > > privacy@asf.
>> > > > > And
>> > > > > > > > >>>> while I
>> > > > > > > > >>>>> understand (and concur) the urge for opt-in by default
>> > > coming
>> > > > > > from
>> > > > > > > > >>>> consumer
>> > > > > > > > >>>>> market (where it makes perfect sense) Airflow is not a
>> > > > consumer
>> > > > > > > > >>>>> software and is used in "corporate environment" which
>> > has a
>> > > > > > little
>> > > > > > > > >>>>> different expectations and broad assumption that the
>> > > company
>> > > > > can
>> > > > > > make
>> > > > > > > > >>>>> decisions on such telemetry on behalf of the employees
>> > > using
>> > > > > it.
>> > > > > > > > >>>>>
>> > > > > > > > >>>>> We should assume that those who deploy and upgrade
>> > Airflow
>> > > -
>> > > > > > actually
>> > > > > > > > >>>> read
>> > > > > > > > >>>>> and take into account what is written in the release
>> > notes
>> > > -
>> > > > > > > > >>>> especially if
>> > > > > > > > >>>>> they have security guys breathing their necks,
>> similarly
>> > as
>> > > > we
>> > > > > > have
>> > > > > > > > to
>> > > > > > > > >>>>> assume they follow CVE announcements about security
>> > issues
>> > > > > > fixed. If
>> > > > > > > > we
>> > > > > > > > >>>>> are very straightforward and out-going about the
>> change,
>> > > > inform
>> > > > > > very
>> > > > > > > > >>>>> clearly how to opt-out, I don't see a big problem with
>> > > > opt-out.
>> > > > > > > > >>>>>
>> > > > > > > > >>>>> We should of course check with privacy@a.o (but I'v
>> > spend
>> > > a
>> > > > > good
>> > > > > > > > deal
>> > > > > > > > >>>> of
>> > > > > > > > >>>>> time reading the Superset  and other use case and
>> > > explanation
>> > > > > in
>> > > > > > > > >>>> detail to
>> > > > > > > > >>>>> make a better informed decision) - and it looks like
>> they
>> > > > also
>> > > > > > went
>> > > > > > > > >>>> opt-out
>> > > > > > > > >>>>> way and got cleared by privacy@a.o.  And if we cannot
>> > > reach
>> > > > > > > > >>>> consensus, we
>> > > > > > > > >>>>> should - as usual - make a voting decision on it
>> (because
>> > > > yes,
>> > > > > > it is
>> > > > > > > > an
>> > > > > > > > >>>>> important decision), but - after reading and
>> > understanding
>> > > > why
>> > > > > > others
>> > > > > > > > >>>> also
>> > > > > > > > >>>>> did it - for me personally, opt-out is a good path.
>> > > > > > > > >>>>>
>> > > > > > > > >>>>> Also because it will rather increase the amount of
>> data
>> > to
>> > > > > > gather,
>> > > > > > > > and
>> > > > > > > > >>>> in
>> > > > > > > > >>>>> our case - counter intuitively - it will be even
>> better
>> > for
>> > > > > > privacy
>> > > > > > > > and
>> > > > > > > > >>>>> corporate anonymity, because the more data we get, the
>> > more
>> > > > > > difficult
>> > > > > > > > >>>> it
>> > > > > > > > >>>>> will be to get any non-statistical/non-aggregated
>> insight
>> > > > from
>> > > > > > it.
>> > > > > > > > >>>> Imagine
>> > > > > > > > >>>>> if only a few corporate users will enable it
>> consciously
>> > -
>> > > > then
>> > > > > > we
>> > > > > > > > >>>> will be
>> > > > > > > > >>>>> able to draw much more conclusions if we find out who
>> > they
>> > > > are,
>> > > > > > than
>> > > > > > > > if
>> > > > > > > > >>>>> everyone has it enabled by default.
>> > > > > > > > >>>>>
>> > > > > > > > >>>>> That's my take on it - but again, it's up to us to
>> vote,
>> > > for
>> > > > me
>> > > > > > > > opt-in
>> > > > > > > > >>>> is
>> > > > > > > > >>>>> not "has to", and I am rather for opt-out.
>> > > > > > > > >>>>>
>> > > > > > > > >>>>> J.
>> > > > > > > > >>>>>
>> > > > > > > > >>>>>> Hi all,
>> > > > > > > > >>>>>>
>> > > > > > > > >>>>>>
>> > > > > > > > >>>>>>> I want to propose gathering telemetry for Airflow
>> > > > > > installations.
>> > > > > > > > >>>> As the
>> > > > > > > > >>>>>>> Airflow community, we have been relying heavily on
>> the
>> > > > yearly
>> > > > > > > > >>>> Airflow
>> > > > > > > > >>>>>>> Survey and anecdotes to answer a few key questions
>> > about
>> > > > > > Airflow
>> > > > > > > > >>>> usage.
>> > > > > > > > >>>>>>> Questions like the following:
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>>   - Which versions of Airflow are people
>> > installing/using
>> > > > now
>> > > > > > > > >>>> (i.e.
>> > > > > > > > >>>>>>>   whether people have primarily made the jump from
>> > > version
>> > > > X
>> > > > > to
>> > > > > > > > >>>>> version
>> > > > > > > > >>>>>> Y)
>> > > > > > > > >>>>>>>   - Which DB is used as the Metadata DB and which
>> > version
>> > > > e.g
>> > > > > > Pg
>> > > > > > > > >>>> 14?
>> > > > > > > > >>>>>>>   - What Python version is being used?
>> > > > > > > > >>>>>>>   - Which Executor is being used?
>> > > > > > > > >>>>>>>   - Approximately how many people out there in the
>> > world
>> > > > are
>> > > > > > > > >>>>> installing
>> > > > > > > > >>>>>>>   Airflow
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>> There is a solution that should help answer these
>> > > > questions:
>> > > > > > Scarf
>> > > > > > > > >>>> [1].
>> > > > > > > > >>>>>> The
>> > > > > > > > >>>>>>> ASF already approves Scarf [2][3] and is already
>> used
>> > by
>> > > > > other
>> > > > > > ASF
>> > > > > > > > >>>>>>> projects: Superset [4], Dolphin Scheduler [5], Dubbo
>> > > > > > Kubernetes,
>> > > > > > > > >>>>> DevLake,
>> > > > > > > > >>>>>>> Skywalking as it follows GDPR and other regulations.
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>> Similar to Superset, we probably can use it as
>> follows:
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>>   1. Install the `scarf js` npm package and bundle
>> it
>> > in
>> > > > the
>> > > > > > > > >>>>> Webserver.
>> > > > > > > > >>>>>>>   When the package is downloaded & Airflow
>> webserver is
>> > > > > opened,
>> > > > > > > > >>>>> metadata
>> > > > > > > > >>>>>>> is
>> > > > > > > > >>>>>>>   recorded to the Scarf dashboard.
>> > > > > > > > >>>>>>>   2. Utilize the Scarf Gateway [6], which we can
>> use in
>> > > > front
>> > > > > > of
>> > > > > > > > >>>>> docker
>> > > > > > > > >>>>>>>   containers. While it’s possible people go around
>> this
>> > > > > > gateway,
>> > > > > > > > >>>> we
>> > > > > > > > >>>>> can
>> > > > > > > > >>>>>>>   probably configure and encourage most traffic to
>> go
>> > > > through
>> > > > > > > > >>>> these
>> > > > > > > > >>>>>>> gateways.
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>> While Scarf does not store any personally
>> identifying
>> > > > > > information
>> > > > > > > > >>>> from
>> > > > > > > > >>>>>> SDK
>> > > > > > > > >>>>>>> telemetry data, it does send various bits of
>> IP-derived
>> > > > > > > > >>>> information as
>> > > > > > > > >>>>>>> outlined here [7]. This data should be made as
>> > > transparent
>> > > > as
>> > > > > > > > >>>> possible
>> > > > > > > > >>>>> by
>> > > > > > > > >>>>>>> granting dashboard access to the Airflow PMC and any
>> > > other
>> > > > > > relevant
>> > > > > > > > >>>>> means
>> > > > > > > > >>>>>>> of sharing/surfacing it that we encounter (Town
>> Hall,
>> > > > Slack,
>> > > > > > > > >>>> Newsletter
>> > > > > > > > >>>>>>> etc).
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>> The following case studies are worth reading:
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>>   1.
>> > > > > > https://about.scarf.sh/post/scarf-case-study-apache-superset
>> > > > > > > > >>>>> (From
>> > > > > > > > >>>>>>>   Maxime)
>> > > > > > > > >>>>>>>   2.
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>
>> > > > > > > > >>>>>
>> > > > > > > > >>>>
>> > > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://about.scarf.sh/post/haskell-org-bridging-the-gap-between-language-innovation-and-community-understanding
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>> Similar to them, this could help in various ways
>> that
>> > > come
>> > > > > with
>> > > > > > > > >>>> using
>> > > > > > > > >>>>>> data
>> > > > > > > > >>>>>>> for decision-making. With clear guidelines on "how
>> to
>> > > > > opt-out"
>> > > > > > > > >>>>>> [8][9][10] &
>> > > > > > > > >>>>>>> "what data is being collected" on the Airflow
>> website,
>> > > this
>> > > > > > can be
>> > > > > > > > >>>>>>> beneficial to the entire community as we would be
>> > making
>> > > > more
>> > > > > > > > >>>> informed
>> > > > > > > > >>>>>>> decisions.
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>> Regards,
>> > > > > > > > >>>>>>> Kaxil
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>> [1] https://about.scarf.sh/
>> > > > > > > > >>>>>>> [2]
>> > > > > > https://privacy.apache.org/policies/privacy-policy-public.html
>> > > > > > > > >>>>>>> [3] https://privacy.apache.org/faq/committers.html
>> > > > > > > > >>>>>>> [4] https://github.com/apache/superset/issues/25639
>> > > > > > > > >>>>>>> [5]
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>
>> > > > > > > > >>>>>
>> > > > > > > > >>>>
>> > > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/search?q=repo%3Aapache%2Fdolphinscheduler%20scarf.sh&type=code
>> > > > > > > > >>>>>>> [6] https://about.scarf.sh/scarf-gateway
>> > > > > > > > >>>>>>> [7] https://about.scarf.sh/privacy-policy
>> > > > > > > > >>>>>>> [8]
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>
>> > > > > > > > >>>>>
>> > > > > > > > >>>>
>> > > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://superset.apache.org/docs/frequently-asked-questions/#does-superset-collect-any-telemetry-data
>> > > > > > > > >>>>>>> [9]
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>
>> > > > > > > > >>>>>
>> > > > > > > > >>>>
>> > > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://superset.apache.org/docs/installation/installing-superset-using-docker-compose
>> > > > > > > > >>>>>>> [10]
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>
>> > > > > > > > >>>>>
>> > > > > > > > >>>>
>> > > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics
>> > > > > > > > >>>>>>>
>> > > > > > > > >>>>>>
>> > > > > > > > >>>>>
>> > > > > > > > >>>>
>> > > > > > > > >>>
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > >
>> ---------------------------------------------------------------------
>> > > > > > > > To unsubscribe, e-mail: dev-unsubscribe@airflow.apache.org
>> > > > > > > > For additional commands, e-mail:
>> dev-help@airflow.apache.org
>> > > > > > > >
>> > > > > > > >
>> > > > > >
>> > > > > >
>> > ---------------------------------------------------------------------
>> > > > > > To unsubscribe, e-mail: dev-unsubscribe@airflow.apache.org
>> > > > > > For additional commands, e-mail: dev-help@airflow.apache.org
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] Proposal for adding Telemetry via Scarf

Posted by Kaxil Naik <ka...@gmail.com>.
Yup, exactly.

I believe this would definitely help us take early and informed decisions.
> E.g. Had we had this earlier, I believe it would have definitely helped us
> more for our past discussions like whether we should continue supporting
> MsSQL(https://lists.apache.org/thread/r06j306hldg03g2my1pd4nyjxg78b3h4),
> similarly about the DaskExecutor (
> https://lists.apache.org/thread/ptwjf5g87lyl5476krt91bzfrm96pnb1), etc.
>


Btw clarifying my own stance on the below; and let me know what you
think @Hussein
Awala <hu...@awala.fr> : I'd like to propose, that we start with
collecting simple data with limited access: to all the PMC members. We can
always expand it to Committers and then expand further to make it
invite-only or setup exporting it to a DB like Postgres
<https://github.com/scarf-sh/scarf-postgres-exporter> and have a publicly
viewable dashboard. It would be similar to an iterative software
development approach, since this will be the first time for us, as Airflow
PMC, to add such telemetry. This is of course just my opinion though :)

Regarding the data, like I had mentioned in the email and I am glad others
> including you are on the same page that the data will be shared with all
> PMC members. The point about sharing it via website and newsletter was for
> the community — Airflow users. I don’t think anyone in the community (apart
> from the PMC members) would need raw data. And even if they need it, I’d
> say they should put effort and contribute to the Airflow project and become
> PMC members.
> To be clear: this telemetry data should help us, as Airflow PMC, to steer
> some of the decision making based on this data similar to how only PMC has
> a binding vote on the releases. [1] and this is similar to how Apache
> Superset does it too.
> [1]
> https://www.apache.org/dev/pmc.html#what-is-a-pmc


On Wed, 3 Apr 2024 at 12:03, Pankaj Koti <pa...@astronomer.io.invalid>
wrote:

> +1 to introduce this.
>
> I believe this would definitely help us take early and informed decisions.
> E.g. Had we had this earlier, I believe it would have definitely helped us
> more for our past discussions like whether we should continue supporting
> MsSQL(https://lists.apache.org/thread/r06j306hldg03g2my1pd4nyjxg78b3h4),
> similarly about the DaskExecutor (
> https://lists.apache.org/thread/ptwjf5g87lyl5476krt91bzfrm96pnb1), etc.
>
>
> Best regards,
>
> *Pankaj Koti*
> Senior Software Engineer (Airflow OSS Engineering team)
> Location: Pune, Maharashtra, India
> Timezone: Indian Standard Time (IST)
> Phone: +91 9730079985
>
>
> On Wed, Apr 3, 2024 at 2:44 PM Kaxil Naik <ka...@gmail.com> wrote:
>
> > Yup, I had added a link to scarf docs in the original email that
> referenced
> > opting out and we should even add an Airflow config that puts all config
> in
> > a single place. Without it we can’t be compliant to all the policies even
> > if we collectively ignore or are unaware of the importance of it.
> >
> > Regarding the data, like I had mentioned in the email and I am glad
> others
> > including you are on the same page that the data will be shared with all
> > PMC members. The point about sharing it via website and newsletter was
> for
> > the community — Airflow users. I don’t think anyone in the community
> (apart
> > from the PMC members) would need raw data. And even if they need it, I’d
> > say they should put effort and contribute to the Airflow project and
> become
> > PMC members.
> >
> > To be clear: this telemetry data should help us, as Airflow PMC, to steer
> > some of the decision making based on this data similar to how only PMC
> has
> > a binding vote on the releases. [1] and this is similar to how Apache
> > Superset does it too.
> >
> > [1]
> > https://www.apache.org/dev/pmc.html#what-is-a-pmc
> >
> >
> >
> > On Wed, 3 Apr 2024 at 00:05, Hussein Awala <hu...@awala.fr> wrote:
> >
> > > I mentioned opting out just to confirm its importance, and after
> checking
> > > the Scarf documentation it appears to be supported natively by Scarf.
> For
> > > data accessibility, my point was more about raw data, not just
> aggregated
> > > information/insights shared via monthly newsletters, as we do for
> Airflow
> > > annual Survey for example:
> > > https://airflow.apache.org/survey vs
> > >
> > >
> >
> https://docs.google.com/forms/d/1wYm6c5Gn379zkg7zD7vcWB-1fCjnOocT0oZm-tjft_Q/viewanalytics
> > > .
> > >
> > > On Tue, Apr 2, 2024 at 2:43 PM Kaxil Naik <ka...@gmail.com> wrote:
> > >
> > > > Agreed to both your points Hussein but both the points are already
> > > covered
> > > > in my original discussion post - both about opting out and providing
> > data
> > > > to all the PMC members and providing visibility via Monthly
> > newsletters.
> > > Is
> > > > there anything else you propose to discuss that isn’t covered?
> > > >
> > > >
> > > >
> > > > On Mon, 1 Apr 2024 at 13:21, Hussein Awala <hu...@awala.fr> wrote:
> > > >
> > > > > +1 for the idea in general, but there are two main points to
> discuss
> > > > before
> > > > > voting on this:
> > > > >
> > > > > 1. We should provide an option to disable Scarf:
> > > > > As Airflow is not a paid product, we cannot force companies to
> report
> > > > their
> > > > > use of this project. Otherwise, some may choose to create their own
> > > fork
> > > > > just to disable Scarf.
> > > > >
> > > > > 2. Concerning the exclusivity of access to data:
> > > > > The data collected must either be completely proprietary for use by
> > PMC
> > > > and
> > > > > ASF, or completely open. Since many companies offer Airflow as a
> > > product,
> > > > > it is imperative not to give one company more privileges than
> > others. I
> > > > > raise this point for the principle of equality of opportunity.
> > > > >
> > > > > On Mon, Apr 1, 2024 at 12:35 PM Ankit Chaurasia <
> sunank200@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Big +1 for Scarf.
> > > > > >
> > > > > > Transparency is key, so it's important to be super clear about
> > opting
> > > > > > out and what's tracked to avoid spooking anyone about IP stuff.
> > > > > >
> > > > > > Regards
> > > > > > Ankit Chaurasia
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Apr 1, 2024 at 10:18 AM Amogh Desai <
> > > amoghdesai.oss@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > +1 looks like a good tool which could be super helpful.
> > > > > > >
> > > > > > > * We should have some transparency into the data that is
> > collected
> > > or
> > > > > > sent
> > > > > > > * We should have an option to optionally opt-out
> > > > > > >
> > > > > > > Thanks & Regards,
> > > > > > > Amogh Desai
> > > > > > >
> > > > > > >
> > > > > > > On Sun, Mar 31, 2024 at 7:53 AM Wei Lee <we...@gmail.com>
> > > wrote:
> > > > > > >
> > > > > > > > +1 to this. It would be really useful. As long as we can opt
> > > out, I
> > > > > > think
> > > > > > > > we’re good.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Wei
> > > > > > > >
> > > > > > > > > On Mar 31, 2024, at 12:47 AM, Kaxil Naik <
> > kaxilnaik@gmail.com>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Grammar Correction:
> > > > > > > > >
> > > > > > > > > We should assume that those who deploy and upgrade Airflow
> -
> > > > > actually
> > > > > > > > read
> > > > > > > > >> and take into account what is written in the release
> notes -
> > > > > > especially
> > > > > > > > if
> > > > > > > > >> they have security guys breathing their necks, similarly
> as
> > we
> > > > > have
> > > > > > to
> > > > > > > > >> assume they follow CVE announcements about security issues
> > > > fixed.
> > > > > > If we
> > > > > > > > >> are very straightforward and out-going about the change,
> > > inform
> > > > > very
> > > > > > > > >> clearly how to opt-out, I don't see a big problem with
> > > opt-out.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I couldn't agree more; even though we shouldn't collect any
> > > data
> > > > > that
> > > > > > > > > hamper security (and we should aim to do the same), most
> > > security
> > > > > > > > concerned
> > > > > > > > > folks don't just upgrade, and we can rely on them regarding
> > > > release
> > > > > > notes
> > > > > > > > > or announcements and we can make it very clear in our
> > > > announcements
> > > > > > too;
> > > > > > > > > and in our installation guides.
> > > > > > > > >
> > > > > > > > > On Sat, 30 Mar 2024 at 16:47, Kaxil Naik <
> > kaxilnaik@gmail.com>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> Grammar crrection:
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> On Sat, 30 Mar 2024 at 16:43, Kaxil Naik <
> > kaxilnaik@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > >>
> > > > > > > > >>> Have this at the end of the email too: but if folks don't
> > > read
> > > > > > until
> > > > > > > > the
> > > > > > > > >>> end and quoting Maxime from the use-case blog[1]:
> > > > > > > > >>>
> > > > > > > > >>> "I think people often ask ‘how do I contribute to open
> > > > source?’,
> > > > > > ‘I've
> > > > > > > > >>> got to get into the code’, or ‘ I’ve got to be an
> > engineer.’
> > > > > > Actually,
> > > > > > > > the
> > > > > > > > >>> very simplest thing that you can do is just say, ‘my
> > > > organization
> > > > > > gets
> > > > > > > > real
> > > > > > > > >>> value from this piece of software.’ There are a bunch of
> > ways
> > > > to
> > > > > > let
> > > > > > > > the
> > > > > > > > >>> people know about it – and now Scarf is there. If your
> > > > > > organization is
> > > > > > > > >>> getting a lot of value from a piece of open source
> > software,
> > > > make
> > > > > > sure
> > > > > > > > the
> > > > > > > > >>> devs know about it."
> > > > > > > > >>>
> > > > > > > > >>> What kind of edge cases are you thinking about? I don't
> > think
> > > > it
> > > > > > makes
> > > > > > > > >>> sense to have "opt-in" at all. As the goal is to collect
> > data
> > > > for
> > > > > > most
> > > > > > > > >>> Airflow installations except for those that don't want to
> > > give
> > > > > > data,
> > > > > > > > then
> > > > > > > > >>> "opt-out" is the only way to maximize it. As long as we
> > don't
> > > > > > collect
> > > > > > > > any
> > > > > > > > >>> PII data, this is in-compliance as well.
> > > > > > > > >>>
> > > > > > > > >>> Imagine someone learning Airflow, if they have to opt-in
> > via
> > > a
> > > > > > config,
> > > > > > > > >>> they wouldn't even know or care about it, hence us losing
> > > most
> > > > of
> > > > > > the
> > > > > > > > data.
> > > > > > > > >>> I understand why some orgs & individuals may want to
> > opt-out.
> > > > > > > > >>>
> > > > > > > > >>> Scarf Provides tracking pixels (essentially an HTML image
> > > tag)
> > > > > > that you
> > > > > > > > >>> can place in your website or product to track visitors to
> > > that
> > > > > > URL. If
> > > > > > > > >>> there were any concerns about Privacy, ASF wouldn't have
> > > > approved
> > > > > > it
> > > > > > > > at all.
> > > > > > > > >>>
> > > > > > > > >>> A few key details to note about the pixel:
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>>   - No PII is tracked… Scarf does not capture/retain IP
> > > > > > information…
> > > > > > > > >>>   this information is discarded by the platform upon
> > > > > > > > processing/aggregating
> > > > > > > > >>>   - Scarf pixels respect the Do Not Track (DNT) settings
> of
> > > > > > browsers -
> > > > > > > > >>>   these users will not be tracked whatsoever.
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>> All the ASF projects I had listed (whether they use Scarf
> > > > gateway
> > > > > > or
> > > > > > > > >>> Scarf pixel in product) are using opt-out.
> > > > > > > > >>>
> > > > > > > > >>> 1. Short opt-in period before opt-out. Test this feature
> > with
> > > > > > users who
> > > > > > > > >>>> trust and if it works great - make it public. I think
> it's
> > > > wise
> > > > > to
> > > > > > > > handle
> > > > > > > > >>>> edge cases and configure collected data more accurately.
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>> It would be a pixel in the webserver, should affect
> nothing
> > > at
> > > > > all
> > > > > > even
> > > > > > > > >>> in an air-gapped environment.
> > > > > > > > >>>
> > > > > > > > >>>> 2. It should not affect anything if access to the
> internet
> > > is
> > > > > > > > restricted
> > > > > > > > >>>> which is default for many companies.
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>> 100% agreed on the below:
> > > > > > > > >>>
> > > > > > > > >>>> I think we have a very good blueprint to follow
> including
> > at
> > > > > > least 5
> > > > > > > > >>>> other
> > > > > > > > >>>> ASF projects that also passed the review of the
> > privacy@asf.
> > > > > And
> > > > > > > > while I
> > > > > > > > >>>> understand (and concur) the urge for opt-in by default
> > > coming
> > > > > from
> > > > > > > > >>>> consumer
> > > > > > > > >>>> market (where it makes perfect sense) Airflow is not a
> > > > consumer
> > > > > > > > >>>> software and is used in "corporate environment" which
> has
> > a
> > > > > little
> > > > > > > > >>>> different expectations and broad assumption that the
> > company
> > > > can
> > > > > > make
> > > > > > > > >>>> decisions on such telemetry on behalf of the employees
> > using
> > > > it.
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>> Couldn't agree more; even though there shouldn't we
> collect
> > > > > hamper
> > > > > > > > >>> security (and we should aim to do the same), most
> security
> > > > > > concerned
> > > > > > > > folks
> > > > > > > > >>> don't just
> > > > > > > > >>> upgrade, and we can rely on them regarding release notes
> or
> > > > > > > > announcements
> > > > > > > > >>> and we can make it very clear in our announcements too;
> and
> > > in
> > > > > our
> > > > > > > > >>> installation guides.
> > > > > > > > >>>
> > > > > > > > >>> We should assume that those who deploy and upgrade
> Airflow
> > -
> > > > > > actually
> > > > > > > > read
> > > > > > > > >>>> and take into account what is written in the release
> > notes -
> > > > > > > > especially
> > > > > > > > >>>> if
> > > > > > > > >>>> they have security guys breathing their necks, similarly
> > as
> > > we
> > > > > > have to
> > > > > > > > >>>> assume they follow CVE announcements about security
> issues
> > > > > fixed.
> > > > > > If
> > > > > > > > we
> > > > > > > > >>>> are very straightforward and out-going about the change,
> > > > inform
> > > > > > very
> > > > > > > > >>>> clearly how to opt-out, I don't see a big problem with
> > > > opt-out.
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>> To be clear, the collection of data, or at least the data
> > we
> > > > > should
> > > > > > > > >>> gather here should help all the consumers without
> violating
> > > > > > anything
> > > > > > > > >>> regulations. I will quote Maxime's quote in the use-case
> > doc
> > > > [1]
> > > > > > > > >>>
> > > > > > > > >>> "*Another Form of Contributing*
> > > > > > > > >>> “I think people often ask ‘how do I contribute to open
> > > > source?’,
> > > > > > ‘I've
> > > > > > > > >>> got to get into the code’, or ‘ I’ve got to be an
> > engineer.’
> > > > > > Actually,
> > > > > > > > the
> > > > > > > > >>> very simplest thing that you can do is just say, ‘my
> > > > organization
> > > > > > gets
> > > > > > > > real
> > > > > > > > >>> value from this piece of software.’ There are a bunch of
> > ways
> > > > to
> > > > > > let
> > > > > > > > the
> > > > > > > > >>> people know about it – and now Scarf is there. If your
> > > > > > organization is
> > > > > > > > >>> getting a lot of value from a piece of open source
> > software,
> > > > make
> > > > > > sure
> > > > > > > > the
> > > > > > > > >>> devs know about it.”"
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>> [1]
> > > > https://about.scarf.sh/post/scarf-case-study-apache-superset
> > > > > > > > >>>
> > > > > > > > >>> On Sat, 30 Mar 2024 at 14:02, Alexander Shorin <
> > > > > kxepal@apache.org>
> > > > > > > > wrote:
> > > > > > > > >>>
> > > > > > > > >>>> Hi Jarek!
> > > > > > > > >>>>
> > > > > > > > >>>> I understand the reasons for opt-out from a project
> view.
> > I
> > > > just
> > > > > > > > suddenly
> > > > > > > > >>>> imagined the situation when an upgrade happens and here
> > > comes
> > > > > the
> > > > > > > > data to
> > > > > > > > >>>> some third party service - that's a view from a user
> side
> > of
> > > > > some
> > > > > > big
> > > > > > > > >>>> company.
> > > > > > > > >>>>
> > > > > > > > >>>> There could be good alternatives to handle this:
> > > > > > > > >>>> 1. Short opt-in period before opt-out. Test this feature
> > > with
> > > > > > users
> > > > > > > > who
> > > > > > > > >>>> trust and if it works great - make it public. I think
> it's
> > > > wise
> > > > > to
> > > > > > > > handle
> > > > > > > > >>>> edge cases and configure collected data more accurately.
> > > > > > > > >>>> 2. Explicitly somehow warn about this feature to make
> this
> > > > > > feature not
> > > > > > > > >>>> get
> > > > > > > > >>>> unnoticed. Just to reduce possible frustration.
> > > > > > > > >>>>
> > > > > > > > >>>> Just a personal thoughts for discussion (:
> > > > > > > > >>>>
> > > > > > > > >>>> --
> > > > > > > > >>>> ,,,^..^,,,
> > > > > > > > >>>>
> > > > > > > > >>>> On Sat, Mar 30, 2024 at 4:36 PM Jarek Potiuk <
> > > > jarek@potiuk.com>
> > > > > > > > wrote:
> > > > > > > > >>>>
> > > > > > > > >>>>> Hello everyone,
> > > > > > > > >>>>>
> > > > > > > > >>>>> it has to be:
> > > > > > > > >>>>>
> > > > > > > > >>>>> 1. Opt-in by default to not trigger security guys about
> > new
> > > > > > unplanned
> > > > > > > > >>>>>> activity after regular upgrade.
> > > > > > > > >>>>>>
> > > > > > > > >>>>>
> > > > > > > > >>>>> That's a very good point about security triggering
> > > Alexander,
> > > > > > but I
> > > > > > > > am
> > > > > > > > >>>> not
> > > > > > > > >>>>> so sure it means that we "have to" do opt-in. There are
> > > other
> > > > > > ways of
> > > > > > > > >>>>> communicating with the "deployment managers" who
> install
> > > and
> > > > > > upgrade
> > > > > > > > >>>>> airflow - i.e. release notes. blogs, social media of
> > ours,
> > > > > slack
> > > > > > > > >>>>> announcements etc. We have plenty of channels we can
> use
> > to
> > > > > > > > >>>> communicate the
> > > > > > > > >>>>> change.
> > > > > > > > >>>>>
> > > > > > > > >>>>> I think we have a very good blueprint to follow
> including
> > > at
> > > > > > least 5
> > > > > > > > >>>> other
> > > > > > > > >>>>> ASF projects that also passed the review of the
> > > privacy@asf.
> > > > > And
> > > > > > > > >>>> while I
> > > > > > > > >>>>> understand (and concur) the urge for opt-in by default
> > > coming
> > > > > > from
> > > > > > > > >>>> consumer
> > > > > > > > >>>>> market (where it makes perfect sense) Airflow is not a
> > > > consumer
> > > > > > > > >>>>> software and is used in "corporate environment" which
> > has a
> > > > > > little
> > > > > > > > >>>>> different expectations and broad assumption that the
> > > company
> > > > > can
> > > > > > make
> > > > > > > > >>>>> decisions on such telemetry on behalf of the employees
> > > using
> > > > > it.
> > > > > > > > >>>>>
> > > > > > > > >>>>> We should assume that those who deploy and upgrade
> > Airflow
> > > -
> > > > > > actually
> > > > > > > > >>>> read
> > > > > > > > >>>>> and take into account what is written in the release
> > notes
> > > -
> > > > > > > > >>>> especially if
> > > > > > > > >>>>> they have security guys breathing their necks,
> similarly
> > as
> > > > we
> > > > > > have
> > > > > > > > to
> > > > > > > > >>>>> assume they follow CVE announcements about security
> > issues
> > > > > > fixed. If
> > > > > > > > we
> > > > > > > > >>>>> are very straightforward and out-going about the
> change,
> > > > inform
> > > > > > very
> > > > > > > > >>>>> clearly how to opt-out, I don't see a big problem with
> > > > opt-out.
> > > > > > > > >>>>>
> > > > > > > > >>>>> We should of course check with privacy@a.o (but I'v
> > spend
> > > a
> > > > > good
> > > > > > > > deal
> > > > > > > > >>>> of
> > > > > > > > >>>>> time reading the Superset  and other use case and
> > > explanation
> > > > > in
> > > > > > > > >>>> detail to
> > > > > > > > >>>>> make a better informed decision) - and it looks like
> they
> > > > also
> > > > > > went
> > > > > > > > >>>> opt-out
> > > > > > > > >>>>> way and got cleared by privacy@a.o.  And if we cannot
> > > reach
> > > > > > > > >>>> consensus, we
> > > > > > > > >>>>> should - as usual - make a voting decision on it
> (because
> > > > yes,
> > > > > > it is
> > > > > > > > an
> > > > > > > > >>>>> important decision), but - after reading and
> > understanding
> > > > why
> > > > > > others
> > > > > > > > >>>> also
> > > > > > > > >>>>> did it - for me personally, opt-out is a good path.
> > > > > > > > >>>>>
> > > > > > > > >>>>> Also because it will rather increase the amount of data
> > to
> > > > > > gather,
> > > > > > > > and
> > > > > > > > >>>> in
> > > > > > > > >>>>> our case - counter intuitively - it will be even better
> > for
> > > > > > privacy
> > > > > > > > and
> > > > > > > > >>>>> corporate anonymity, because the more data we get, the
> > more
> > > > > > difficult
> > > > > > > > >>>> it
> > > > > > > > >>>>> will be to get any non-statistical/non-aggregated
> insight
> > > > from
> > > > > > it.
> > > > > > > > >>>> Imagine
> > > > > > > > >>>>> if only a few corporate users will enable it
> consciously
> > -
> > > > then
> > > > > > we
> > > > > > > > >>>> will be
> > > > > > > > >>>>> able to draw much more conclusions if we find out who
> > they
> > > > are,
> > > > > > than
> > > > > > > > if
> > > > > > > > >>>>> everyone has it enabled by default.
> > > > > > > > >>>>>
> > > > > > > > >>>>> That's my take on it - but again, it's up to us to
> vote,
> > > for
> > > > me
> > > > > > > > opt-in
> > > > > > > > >>>> is
> > > > > > > > >>>>> not "has to", and I am rather for opt-out.
> > > > > > > > >>>>>
> > > > > > > > >>>>> J.
> > > > > > > > >>>>>
> > > > > > > > >>>>>> Hi all,
> > > > > > > > >>>>>>
> > > > > > > > >>>>>>
> > > > > > > > >>>>>>> I want to propose gathering telemetry for Airflow
> > > > > > installations.
> > > > > > > > >>>> As the
> > > > > > > > >>>>>>> Airflow community, we have been relying heavily on
> the
> > > > yearly
> > > > > > > > >>>> Airflow
> > > > > > > > >>>>>>> Survey and anecdotes to answer a few key questions
> > about
> > > > > > Airflow
> > > > > > > > >>>> usage.
> > > > > > > > >>>>>>> Questions like the following:
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>>   - Which versions of Airflow are people
> > installing/using
> > > > now
> > > > > > > > >>>> (i.e.
> > > > > > > > >>>>>>>   whether people have primarily made the jump from
> > > version
> > > > X
> > > > > to
> > > > > > > > >>>>> version
> > > > > > > > >>>>>> Y)
> > > > > > > > >>>>>>>   - Which DB is used as the Metadata DB and which
> > version
> > > > e.g
> > > > > > Pg
> > > > > > > > >>>> 14?
> > > > > > > > >>>>>>>   - What Python version is being used?
> > > > > > > > >>>>>>>   - Which Executor is being used?
> > > > > > > > >>>>>>>   - Approximately how many people out there in the
> > world
> > > > are
> > > > > > > > >>>>> installing
> > > > > > > > >>>>>>>   Airflow
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>> There is a solution that should help answer these
> > > > questions:
> > > > > > Scarf
> > > > > > > > >>>> [1].
> > > > > > > > >>>>>> The
> > > > > > > > >>>>>>> ASF already approves Scarf [2][3] and is already used
> > by
> > > > > other
> > > > > > ASF
> > > > > > > > >>>>>>> projects: Superset [4], Dolphin Scheduler [5], Dubbo
> > > > > > Kubernetes,
> > > > > > > > >>>>> DevLake,
> > > > > > > > >>>>>>> Skywalking as it follows GDPR and other regulations.
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>> Similar to Superset, we probably can use it as
> follows:
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>>   1. Install the `scarf js` npm package and bundle it
> > in
> > > > the
> > > > > > > > >>>>> Webserver.
> > > > > > > > >>>>>>>   When the package is downloaded & Airflow webserver
> is
> > > > > opened,
> > > > > > > > >>>>> metadata
> > > > > > > > >>>>>>> is
> > > > > > > > >>>>>>>   recorded to the Scarf dashboard.
> > > > > > > > >>>>>>>   2. Utilize the Scarf Gateway [6], which we can use
> in
> > > > front
> > > > > > of
> > > > > > > > >>>>> docker
> > > > > > > > >>>>>>>   containers. While it’s possible people go around
> this
> > > > > > gateway,
> > > > > > > > >>>> we
> > > > > > > > >>>>> can
> > > > > > > > >>>>>>>   probably configure and encourage most traffic to go
> > > > through
> > > > > > > > >>>> these
> > > > > > > > >>>>>>> gateways.
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>> While Scarf does not store any personally identifying
> > > > > > information
> > > > > > > > >>>> from
> > > > > > > > >>>>>> SDK
> > > > > > > > >>>>>>> telemetry data, it does send various bits of
> IP-derived
> > > > > > > > >>>> information as
> > > > > > > > >>>>>>> outlined here [7]. This data should be made as
> > > transparent
> > > > as
> > > > > > > > >>>> possible
> > > > > > > > >>>>> by
> > > > > > > > >>>>>>> granting dashboard access to the Airflow PMC and any
> > > other
> > > > > > relevant
> > > > > > > > >>>>> means
> > > > > > > > >>>>>>> of sharing/surfacing it that we encounter (Town Hall,
> > > > Slack,
> > > > > > > > >>>> Newsletter
> > > > > > > > >>>>>>> etc).
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>> The following case studies are worth reading:
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>>   1.
> > > > > > https://about.scarf.sh/post/scarf-case-study-apache-superset
> > > > > > > > >>>>> (From
> > > > > > > > >>>>>>>   Maxime)
> > > > > > > > >>>>>>>   2.
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>
> > > > > > > > >>>>>
> > > > > > > > >>>>
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://about.scarf.sh/post/haskell-org-bridging-the-gap-between-language-innovation-and-community-understanding
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>> Similar to them, this could help in various ways that
> > > come
> > > > > with
> > > > > > > > >>>> using
> > > > > > > > >>>>>> data
> > > > > > > > >>>>>>> for decision-making. With clear guidelines on "how to
> > > > > opt-out"
> > > > > > > > >>>>>> [8][9][10] &
> > > > > > > > >>>>>>> "what data is being collected" on the Airflow
> website,
> > > this
> > > > > > can be
> > > > > > > > >>>>>>> beneficial to the entire community as we would be
> > making
> > > > more
> > > > > > > > >>>> informed
> > > > > > > > >>>>>>> decisions.
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>> Regards,
> > > > > > > > >>>>>>> Kaxil
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>> [1] https://about.scarf.sh/
> > > > > > > > >>>>>>> [2]
> > > > > > https://privacy.apache.org/policies/privacy-policy-public.html
> > > > > > > > >>>>>>> [3] https://privacy.apache.org/faq/committers.html
> > > > > > > > >>>>>>> [4] https://github.com/apache/superset/issues/25639
> > > > > > > > >>>>>>> [5]
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>
> > > > > > > > >>>>>
> > > > > > > > >>>>
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/search?q=repo%3Aapache%2Fdolphinscheduler%20scarf.sh&type=code
> > > > > > > > >>>>>>> [6] https://about.scarf.sh/scarf-gateway
> > > > > > > > >>>>>>> [7] https://about.scarf.sh/privacy-policy
> > > > > > > > >>>>>>> [8]
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>
> > > > > > > > >>>>>
> > > > > > > > >>>>
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://superset.apache.org/docs/frequently-asked-questions/#does-superset-collect-any-telemetry-data
> > > > > > > > >>>>>>> [9]
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>
> > > > > > > > >>>>>
> > > > > > > > >>>>
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://superset.apache.org/docs/installation/installing-superset-using-docker-compose
> > > > > > > > >>>>>>> [10]
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>
> > > > > > > > >>>>>
> > > > > > > > >>>>
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics
> > > > > > > > >>>>>>>
> > > > > > > > >>>>>>
> > > > > > > > >>>>>
> > > > > > > > >>>>
> > > > > > > > >>>
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > ---------------------------------------------------------------------
> > > > > > > > To unsubscribe, e-mail: dev-unsubscribe@airflow.apache.org
> > > > > > > > For additional commands, e-mail: dev-help@airflow.apache.org
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail: dev-unsubscribe@airflow.apache.org
> > > > > > For additional commands, e-mail: dev-help@airflow.apache.org
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Proposal for adding Telemetry via Scarf

Posted by Pankaj Koti <pa...@astronomer.io.INVALID>.
+1 to introduce this.

I believe this would definitely help us take early and informed decisions.
E.g. Had we had this earlier, I believe it would have definitely helped us
more for our past discussions like whether we should continue supporting
MsSQL(https://lists.apache.org/thread/r06j306hldg03g2my1pd4nyjxg78b3h4),
similarly about the DaskExecutor (
https://lists.apache.org/thread/ptwjf5g87lyl5476krt91bzfrm96pnb1), etc.


Best regards,

*Pankaj Koti*
Senior Software Engineer (Airflow OSS Engineering team)
Location: Pune, Maharashtra, India
Timezone: Indian Standard Time (IST)
Phone: +91 9730079985


On Wed, Apr 3, 2024 at 2:44 PM Kaxil Naik <ka...@gmail.com> wrote:

> Yup, I had added a link to scarf docs in the original email that referenced
> opting out and we should even add an Airflow config that puts all config in
> a single place. Without it we can’t be compliant to all the policies even
> if we collectively ignore or are unaware of the importance of it.
>
> Regarding the data, like I had mentioned in the email and I am glad others
> including you are on the same page that the data will be shared with all
> PMC members. The point about sharing it via website and newsletter was for
> the community — Airflow users. I don’t think anyone in the community (apart
> from the PMC members) would need raw data. And even if they need it, I’d
> say they should put effort and contribute to the Airflow project and become
> PMC members.
>
> To be clear: this telemetry data should help us, as Airflow PMC, to steer
> some of the decision making based on this data similar to how only PMC has
> a binding vote on the releases. [1] and this is similar to how Apache
> Superset does it too.
>
> [1]
> https://www.apache.org/dev/pmc.html#what-is-a-pmc
>
>
>
> On Wed, 3 Apr 2024 at 00:05, Hussein Awala <hu...@awala.fr> wrote:
>
> > I mentioned opting out just to confirm its importance, and after checking
> > the Scarf documentation it appears to be supported natively by Scarf. For
> > data accessibility, my point was more about raw data, not just aggregated
> > information/insights shared via monthly newsletters, as we do for Airflow
> > annual Survey for example:
> > https://airflow.apache.org/survey vs
> >
> >
> https://docs.google.com/forms/d/1wYm6c5Gn379zkg7zD7vcWB-1fCjnOocT0oZm-tjft_Q/viewanalytics
> > .
> >
> > On Tue, Apr 2, 2024 at 2:43 PM Kaxil Naik <ka...@gmail.com> wrote:
> >
> > > Agreed to both your points Hussein but both the points are already
> > covered
> > > in my original discussion post - both about opting out and providing
> data
> > > to all the PMC members and providing visibility via Monthly
> newsletters.
> > Is
> > > there anything else you propose to discuss that isn’t covered?
> > >
> > >
> > >
> > > On Mon, 1 Apr 2024 at 13:21, Hussein Awala <hu...@awala.fr> wrote:
> > >
> > > > +1 for the idea in general, but there are two main points to discuss
> > > before
> > > > voting on this:
> > > >
> > > > 1. We should provide an option to disable Scarf:
> > > > As Airflow is not a paid product, we cannot force companies to report
> > > their
> > > > use of this project. Otherwise, some may choose to create their own
> > fork
> > > > just to disable Scarf.
> > > >
> > > > 2. Concerning the exclusivity of access to data:
> > > > The data collected must either be completely proprietary for use by
> PMC
> > > and
> > > > ASF, or completely open. Since many companies offer Airflow as a
> > product,
> > > > it is imperative not to give one company more privileges than
> others. I
> > > > raise this point for the principle of equality of opportunity.
> > > >
> > > > On Mon, Apr 1, 2024 at 12:35 PM Ankit Chaurasia <sunank200@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Big +1 for Scarf.
> > > > >
> > > > > Transparency is key, so it's important to be super clear about
> opting
> > > > > out and what's tracked to avoid spooking anyone about IP stuff.
> > > > >
> > > > > Regards
> > > > > Ankit Chaurasia
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Apr 1, 2024 at 10:18 AM Amogh Desai <
> > amoghdesai.oss@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > +1 looks like a good tool which could be super helpful.
> > > > > >
> > > > > > * We should have some transparency into the data that is
> collected
> > or
> > > > > sent
> > > > > > * We should have an option to optionally opt-out
> > > > > >
> > > > > > Thanks & Regards,
> > > > > > Amogh Desai
> > > > > >
> > > > > >
> > > > > > On Sun, Mar 31, 2024 at 7:53 AM Wei Lee <we...@gmail.com>
> > wrote:
> > > > > >
> > > > > > > +1 to this. It would be really useful. As long as we can opt
> > out, I
> > > > > think
> > > > > > > we’re good.
> > > > > > >
> > > > > > > Best,
> > > > > > > Wei
> > > > > > >
> > > > > > > > On Mar 31, 2024, at 12:47 AM, Kaxil Naik <
> kaxilnaik@gmail.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > Grammar Correction:
> > > > > > > >
> > > > > > > > We should assume that those who deploy and upgrade Airflow -
> > > > actually
> > > > > > > read
> > > > > > > >> and take into account what is written in the release notes -
> > > > > especially
> > > > > > > if
> > > > > > > >> they have security guys breathing their necks, similarly as
> we
> > > > have
> > > > > to
> > > > > > > >> assume they follow CVE announcements about security issues
> > > fixed.
> > > > > If we
> > > > > > > >> are very straightforward and out-going about the change,
> > inform
> > > > very
> > > > > > > >> clearly how to opt-out, I don't see a big problem with
> > opt-out.
> > > > > > > >
> > > > > > > >
> > > > > > > > I couldn't agree more; even though we shouldn't collect any
> > data
> > > > that
> > > > > > > > hamper security (and we should aim to do the same), most
> > security
> > > > > > > concerned
> > > > > > > > folks don't just upgrade, and we can rely on them regarding
> > > release
> > > > > notes
> > > > > > > > or announcements and we can make it very clear in our
> > > announcements
> > > > > too;
> > > > > > > > and in our installation guides.
> > > > > > > >
> > > > > > > > On Sat, 30 Mar 2024 at 16:47, Kaxil Naik <
> kaxilnaik@gmail.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > >> Grammar crrection:
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> On Sat, 30 Mar 2024 at 16:43, Kaxil Naik <
> kaxilnaik@gmail.com
> > >
> > > > > wrote:
> > > > > > > >>
> > > > > > > >>> Have this at the end of the email too: but if folks don't
> > read
> > > > > until
> > > > > > > the
> > > > > > > >>> end and quoting Maxime from the use-case blog[1]:
> > > > > > > >>>
> > > > > > > >>> "I think people often ask ‘how do I contribute to open
> > > source?’,
> > > > > ‘I've
> > > > > > > >>> got to get into the code’, or ‘ I’ve got to be an
> engineer.’
> > > > > Actually,
> > > > > > > the
> > > > > > > >>> very simplest thing that you can do is just say, ‘my
> > > organization
> > > > > gets
> > > > > > > real
> > > > > > > >>> value from this piece of software.’ There are a bunch of
> ways
> > > to
> > > > > let
> > > > > > > the
> > > > > > > >>> people know about it – and now Scarf is there. If your
> > > > > organization is
> > > > > > > >>> getting a lot of value from a piece of open source
> software,
> > > make
> > > > > sure
> > > > > > > the
> > > > > > > >>> devs know about it."
> > > > > > > >>>
> > > > > > > >>> What kind of edge cases are you thinking about? I don't
> think
> > > it
> > > > > makes
> > > > > > > >>> sense to have "opt-in" at all. As the goal is to collect
> data
> > > for
> > > > > most
> > > > > > > >>> Airflow installations except for those that don't want to
> > give
> > > > > data,
> > > > > > > then
> > > > > > > >>> "opt-out" is the only way to maximize it. As long as we
> don't
> > > > > collect
> > > > > > > any
> > > > > > > >>> PII data, this is in-compliance as well.
> > > > > > > >>>
> > > > > > > >>> Imagine someone learning Airflow, if they have to opt-in
> via
> > a
> > > > > config,
> > > > > > > >>> they wouldn't even know or care about it, hence us losing
> > most
> > > of
> > > > > the
> > > > > > > data.
> > > > > > > >>> I understand why some orgs & individuals may want to
> opt-out.
> > > > > > > >>>
> > > > > > > >>> Scarf Provides tracking pixels (essentially an HTML image
> > tag)
> > > > > that you
> > > > > > > >>> can place in your website or product to track visitors to
> > that
> > > > > URL. If
> > > > > > > >>> there were any concerns about Privacy, ASF wouldn't have
> > > approved
> > > > > it
> > > > > > > at all.
> > > > > > > >>>
> > > > > > > >>> A few key details to note about the pixel:
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>   - No PII is tracked… Scarf does not capture/retain IP
> > > > > information…
> > > > > > > >>>   this information is discarded by the platform upon
> > > > > > > processing/aggregating
> > > > > > > >>>   - Scarf pixels respect the Do Not Track (DNT) settings of
> > > > > browsers -
> > > > > > > >>>   these users will not be tracked whatsoever.
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>> All the ASF projects I had listed (whether they use Scarf
> > > gateway
> > > > > or
> > > > > > > >>> Scarf pixel in product) are using opt-out.
> > > > > > > >>>
> > > > > > > >>> 1. Short opt-in period before opt-out. Test this feature
> with
> > > > > users who
> > > > > > > >>>> trust and if it works great - make it public. I think it's
> > > wise
> > > > to
> > > > > > > handle
> > > > > > > >>>> edge cases and configure collected data more accurately.
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>> It would be a pixel in the webserver, should affect nothing
> > at
> > > > all
> > > > > even
> > > > > > > >>> in an air-gapped environment.
> > > > > > > >>>
> > > > > > > >>>> 2. It should not affect anything if access to the internet
> > is
> > > > > > > restricted
> > > > > > > >>>> which is default for many companies.
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>> 100% agreed on the below:
> > > > > > > >>>
> > > > > > > >>>> I think we have a very good blueprint to follow including
> at
> > > > > least 5
> > > > > > > >>>> other
> > > > > > > >>>> ASF projects that also passed the review of the
> privacy@asf.
> > > > And
> > > > > > > while I
> > > > > > > >>>> understand (and concur) the urge for opt-in by default
> > coming
> > > > from
> > > > > > > >>>> consumer
> > > > > > > >>>> market (where it makes perfect sense) Airflow is not a
> > > consumer
> > > > > > > >>>> software and is used in "corporate environment" which has
> a
> > > > little
> > > > > > > >>>> different expectations and broad assumption that the
> company
> > > can
> > > > > make
> > > > > > > >>>> decisions on such telemetry on behalf of the employees
> using
> > > it.
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>> Couldn't agree more; even though there shouldn't we collect
> > > > hamper
> > > > > > > >>> security (and we should aim to do the same), most security
> > > > > concerned
> > > > > > > folks
> > > > > > > >>> don't just
> > > > > > > >>> upgrade, and we can rely on them regarding release notes or
> > > > > > > announcements
> > > > > > > >>> and we can make it very clear in our announcements too; and
> > in
> > > > our
> > > > > > > >>> installation guides.
> > > > > > > >>>
> > > > > > > >>> We should assume that those who deploy and upgrade Airflow
> -
> > > > > actually
> > > > > > > read
> > > > > > > >>>> and take into account what is written in the release
> notes -
> > > > > > > especially
> > > > > > > >>>> if
> > > > > > > >>>> they have security guys breathing their necks, similarly
> as
> > we
> > > > > have to
> > > > > > > >>>> assume they follow CVE announcements about security issues
> > > > fixed.
> > > > > If
> > > > > > > we
> > > > > > > >>>> are very straightforward and out-going about the change,
> > > inform
> > > > > very
> > > > > > > >>>> clearly how to opt-out, I don't see a big problem with
> > > opt-out.
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>> To be clear, the collection of data, or at least the data
> we
> > > > should
> > > > > > > >>> gather here should help all the consumers without violating
> > > > > anything
> > > > > > > >>> regulations. I will quote Maxime's quote in the use-case
> doc
> > > [1]
> > > > > > > >>>
> > > > > > > >>> "*Another Form of Contributing*
> > > > > > > >>> “I think people often ask ‘how do I contribute to open
> > > source?’,
> > > > > ‘I've
> > > > > > > >>> got to get into the code’, or ‘ I’ve got to be an
> engineer.’
> > > > > Actually,
> > > > > > > the
> > > > > > > >>> very simplest thing that you can do is just say, ‘my
> > > organization
> > > > > gets
> > > > > > > real
> > > > > > > >>> value from this piece of software.’ There are a bunch of
> ways
> > > to
> > > > > let
> > > > > > > the
> > > > > > > >>> people know about it – and now Scarf is there. If your
> > > > > organization is
> > > > > > > >>> getting a lot of value from a piece of open source
> software,
> > > make
> > > > > sure
> > > > > > > the
> > > > > > > >>> devs know about it.”"
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>> [1]
> > > https://about.scarf.sh/post/scarf-case-study-apache-superset
> > > > > > > >>>
> > > > > > > >>> On Sat, 30 Mar 2024 at 14:02, Alexander Shorin <
> > > > kxepal@apache.org>
> > > > > > > wrote:
> > > > > > > >>>
> > > > > > > >>>> Hi Jarek!
> > > > > > > >>>>
> > > > > > > >>>> I understand the reasons for opt-out from a project view.
> I
> > > just
> > > > > > > suddenly
> > > > > > > >>>> imagined the situation when an upgrade happens and here
> > comes
> > > > the
> > > > > > > data to
> > > > > > > >>>> some third party service - that's a view from a user side
> of
> > > > some
> > > > > big
> > > > > > > >>>> company.
> > > > > > > >>>>
> > > > > > > >>>> There could be good alternatives to handle this:
> > > > > > > >>>> 1. Short opt-in period before opt-out. Test this feature
> > with
> > > > > users
> > > > > > > who
> > > > > > > >>>> trust and if it works great - make it public. I think it's
> > > wise
> > > > to
> > > > > > > handle
> > > > > > > >>>> edge cases and configure collected data more accurately.
> > > > > > > >>>> 2. Explicitly somehow warn about this feature to make this
> > > > > feature not
> > > > > > > >>>> get
> > > > > > > >>>> unnoticed. Just to reduce possible frustration.
> > > > > > > >>>>
> > > > > > > >>>> Just a personal thoughts for discussion (:
> > > > > > > >>>>
> > > > > > > >>>> --
> > > > > > > >>>> ,,,^..^,,,
> > > > > > > >>>>
> > > > > > > >>>> On Sat, Mar 30, 2024 at 4:36 PM Jarek Potiuk <
> > > jarek@potiuk.com>
> > > > > > > wrote:
> > > > > > > >>>>
> > > > > > > >>>>> Hello everyone,
> > > > > > > >>>>>
> > > > > > > >>>>> it has to be:
> > > > > > > >>>>>
> > > > > > > >>>>> 1. Opt-in by default to not trigger security guys about
> new
> > > > > unplanned
> > > > > > > >>>>>> activity after regular upgrade.
> > > > > > > >>>>>>
> > > > > > > >>>>>
> > > > > > > >>>>> That's a very good point about security triggering
> > Alexander,
> > > > > but I
> > > > > > > am
> > > > > > > >>>> not
> > > > > > > >>>>> so sure it means that we "have to" do opt-in. There are
> > other
> > > > > ways of
> > > > > > > >>>>> communicating with the "deployment managers" who install
> > and
> > > > > upgrade
> > > > > > > >>>>> airflow - i.e. release notes. blogs, social media of
> ours,
> > > > slack
> > > > > > > >>>>> announcements etc. We have plenty of channels we can use
> to
> > > > > > > >>>> communicate the
> > > > > > > >>>>> change.
> > > > > > > >>>>>
> > > > > > > >>>>> I think we have a very good blueprint to follow including
> > at
> > > > > least 5
> > > > > > > >>>> other
> > > > > > > >>>>> ASF projects that also passed the review of the
> > privacy@asf.
> > > > And
> > > > > > > >>>> while I
> > > > > > > >>>>> understand (and concur) the urge for opt-in by default
> > coming
> > > > > from
> > > > > > > >>>> consumer
> > > > > > > >>>>> market (where it makes perfect sense) Airflow is not a
> > > consumer
> > > > > > > >>>>> software and is used in "corporate environment" which
> has a
> > > > > little
> > > > > > > >>>>> different expectations and broad assumption that the
> > company
> > > > can
> > > > > make
> > > > > > > >>>>> decisions on such telemetry on behalf of the employees
> > using
> > > > it.
> > > > > > > >>>>>
> > > > > > > >>>>> We should assume that those who deploy and upgrade
> Airflow
> > -
> > > > > actually
> > > > > > > >>>> read
> > > > > > > >>>>> and take into account what is written in the release
> notes
> > -
> > > > > > > >>>> especially if
> > > > > > > >>>>> they have security guys breathing their necks, similarly
> as
> > > we
> > > > > have
> > > > > > > to
> > > > > > > >>>>> assume they follow CVE announcements about security
> issues
> > > > > fixed. If
> > > > > > > we
> > > > > > > >>>>> are very straightforward and out-going about the change,
> > > inform
> > > > > very
> > > > > > > >>>>> clearly how to opt-out, I don't see a big problem with
> > > opt-out.
> > > > > > > >>>>>
> > > > > > > >>>>> We should of course check with privacy@a.o (but I'v
> spend
> > a
> > > > good
> > > > > > > deal
> > > > > > > >>>> of
> > > > > > > >>>>> time reading the Superset  and other use case and
> > explanation
> > > > in
> > > > > > > >>>> detail to
> > > > > > > >>>>> make a better informed decision) - and it looks like they
> > > also
> > > > > went
> > > > > > > >>>> opt-out
> > > > > > > >>>>> way and got cleared by privacy@a.o.  And if we cannot
> > reach
> > > > > > > >>>> consensus, we
> > > > > > > >>>>> should - as usual - make a voting decision on it (because
> > > yes,
> > > > > it is
> > > > > > > an
> > > > > > > >>>>> important decision), but - after reading and
> understanding
> > > why
> > > > > others
> > > > > > > >>>> also
> > > > > > > >>>>> did it - for me personally, opt-out is a good path.
> > > > > > > >>>>>
> > > > > > > >>>>> Also because it will rather increase the amount of data
> to
> > > > > gather,
> > > > > > > and
> > > > > > > >>>> in
> > > > > > > >>>>> our case - counter intuitively - it will be even better
> for
> > > > > privacy
> > > > > > > and
> > > > > > > >>>>> corporate anonymity, because the more data we get, the
> more
> > > > > difficult
> > > > > > > >>>> it
> > > > > > > >>>>> will be to get any non-statistical/non-aggregated insight
> > > from
> > > > > it.
> > > > > > > >>>> Imagine
> > > > > > > >>>>> if only a few corporate users will enable it consciously
> -
> > > then
> > > > > we
> > > > > > > >>>> will be
> > > > > > > >>>>> able to draw much more conclusions if we find out who
> they
> > > are,
> > > > > than
> > > > > > > if
> > > > > > > >>>>> everyone has it enabled by default.
> > > > > > > >>>>>
> > > > > > > >>>>> That's my take on it - but again, it's up to us to vote,
> > for
> > > me
> > > > > > > opt-in
> > > > > > > >>>> is
> > > > > > > >>>>> not "has to", and I am rather for opt-out.
> > > > > > > >>>>>
> > > > > > > >>>>> J.
> > > > > > > >>>>>
> > > > > > > >>>>>> Hi all,
> > > > > > > >>>>>>
> > > > > > > >>>>>>
> > > > > > > >>>>>>> I want to propose gathering telemetry for Airflow
> > > > > installations.
> > > > > > > >>>> As the
> > > > > > > >>>>>>> Airflow community, we have been relying heavily on the
> > > yearly
> > > > > > > >>>> Airflow
> > > > > > > >>>>>>> Survey and anecdotes to answer a few key questions
> about
> > > > > Airflow
> > > > > > > >>>> usage.
> > > > > > > >>>>>>> Questions like the following:
> > > > > > > >>>>>>>
> > > > > > > >>>>>>>
> > > > > > > >>>>>>>   - Which versions of Airflow are people
> installing/using
> > > now
> > > > > > > >>>> (i.e.
> > > > > > > >>>>>>>   whether people have primarily made the jump from
> > version
> > > X
> > > > to
> > > > > > > >>>>> version
> > > > > > > >>>>>> Y)
> > > > > > > >>>>>>>   - Which DB is used as the Metadata DB and which
> version
> > > e.g
> > > > > Pg
> > > > > > > >>>> 14?
> > > > > > > >>>>>>>   - What Python version is being used?
> > > > > > > >>>>>>>   - Which Executor is being used?
> > > > > > > >>>>>>>   - Approximately how many people out there in the
> world
> > > are
> > > > > > > >>>>> installing
> > > > > > > >>>>>>>   Airflow
> > > > > > > >>>>>>>
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> There is a solution that should help answer these
> > > questions:
> > > > > Scarf
> > > > > > > >>>> [1].
> > > > > > > >>>>>> The
> > > > > > > >>>>>>> ASF already approves Scarf [2][3] and is already used
> by
> > > > other
> > > > > ASF
> > > > > > > >>>>>>> projects: Superset [4], Dolphin Scheduler [5], Dubbo
> > > > > Kubernetes,
> > > > > > > >>>>> DevLake,
> > > > > > > >>>>>>> Skywalking as it follows GDPR and other regulations.
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> Similar to Superset, we probably can use it as follows:
> > > > > > > >>>>>>>
> > > > > > > >>>>>>>
> > > > > > > >>>>>>>   1. Install the `scarf js` npm package and bundle it
> in
> > > the
> > > > > > > >>>>> Webserver.
> > > > > > > >>>>>>>   When the package is downloaded & Airflow webserver is
> > > > opened,
> > > > > > > >>>>> metadata
> > > > > > > >>>>>>> is
> > > > > > > >>>>>>>   recorded to the Scarf dashboard.
> > > > > > > >>>>>>>   2. Utilize the Scarf Gateway [6], which we can use in
> > > front
> > > > > of
> > > > > > > >>>>> docker
> > > > > > > >>>>>>>   containers. While it’s possible people go around this
> > > > > gateway,
> > > > > > > >>>> we
> > > > > > > >>>>> can
> > > > > > > >>>>>>>   probably configure and encourage most traffic to go
> > > through
> > > > > > > >>>> these
> > > > > > > >>>>>>> gateways.
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> While Scarf does not store any personally identifying
> > > > > information
> > > > > > > >>>> from
> > > > > > > >>>>>> SDK
> > > > > > > >>>>>>> telemetry data, it does send various bits of IP-derived
> > > > > > > >>>> information as
> > > > > > > >>>>>>> outlined here [7]. This data should be made as
> > transparent
> > > as
> > > > > > > >>>> possible
> > > > > > > >>>>> by
> > > > > > > >>>>>>> granting dashboard access to the Airflow PMC and any
> > other
> > > > > relevant
> > > > > > > >>>>> means
> > > > > > > >>>>>>> of sharing/surfacing it that we encounter (Town Hall,
> > > Slack,
> > > > > > > >>>> Newsletter
> > > > > > > >>>>>>> etc).
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> The following case studies are worth reading:
> > > > > > > >>>>>>>
> > > > > > > >>>>>>>   1.
> > > > > https://about.scarf.sh/post/scarf-case-study-apache-superset
> > > > > > > >>>>> (From
> > > > > > > >>>>>>>   Maxime)
> > > > > > > >>>>>>>   2.
> > > > > > > >>>>>>>
> > > > > > > >>>>>>>
> > > > > > > >>>>>>
> > > > > > > >>>>>
> > > > > > > >>>>
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://about.scarf.sh/post/haskell-org-bridging-the-gap-between-language-innovation-and-community-understanding
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> Similar to them, this could help in various ways that
> > come
> > > > with
> > > > > > > >>>> using
> > > > > > > >>>>>> data
> > > > > > > >>>>>>> for decision-making. With clear guidelines on "how to
> > > > opt-out"
> > > > > > > >>>>>> [8][9][10] &
> > > > > > > >>>>>>> "what data is being collected" on the Airflow website,
> > this
> > > > > can be
> > > > > > > >>>>>>> beneficial to the entire community as we would be
> making
> > > more
> > > > > > > >>>> informed
> > > > > > > >>>>>>> decisions.
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> Regards,
> > > > > > > >>>>>>> Kaxil
> > > > > > > >>>>>>>
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> [1] https://about.scarf.sh/
> > > > > > > >>>>>>> [2]
> > > > > https://privacy.apache.org/policies/privacy-policy-public.html
> > > > > > > >>>>>>> [3] https://privacy.apache.org/faq/committers.html
> > > > > > > >>>>>>> [4] https://github.com/apache/superset/issues/25639
> > > > > > > >>>>>>> [5]
> > > > > > > >>>>>>>
> > > > > > > >>>>>>>
> > > > > > > >>>>>>
> > > > > > > >>>>>
> > > > > > > >>>>
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/search?q=repo%3Aapache%2Fdolphinscheduler%20scarf.sh&type=code
> > > > > > > >>>>>>> [6] https://about.scarf.sh/scarf-gateway
> > > > > > > >>>>>>> [7] https://about.scarf.sh/privacy-policy
> > > > > > > >>>>>>> [8]
> > > > > > > >>>>>>>
> > > > > > > >>>>>>>
> > > > > > > >>>>>>
> > > > > > > >>>>>
> > > > > > > >>>>
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://superset.apache.org/docs/frequently-asked-questions/#does-superset-collect-any-telemetry-data
> > > > > > > >>>>>>> [9]
> > > > > > > >>>>>>>
> > > > > > > >>>>>>>
> > > > > > > >>>>>>
> > > > > > > >>>>>
> > > > > > > >>>>
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://superset.apache.org/docs/installation/installing-superset-using-docker-compose
> > > > > > > >>>>>>> [10]
> > > > > > > >>>>>>>
> > > > > > > >>>>>>>
> > > > > > > >>>>>>
> > > > > > > >>>>>
> > > > > > > >>>>
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics
> > > > > > > >>>>>>>
> > > > > > > >>>>>>
> > > > > > > >>>>>
> > > > > > > >>>>
> > > > > > > >>>
> > > > > > >
> > > > > > >
> > > > > > >
> > > ---------------------------------------------------------------------
> > > > > > > To unsubscribe, e-mail: dev-unsubscribe@airflow.apache.org
> > > > > > > For additional commands, e-mail: dev-help@airflow.apache.org
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: dev-unsubscribe@airflow.apache.org
> > > > > For additional commands, e-mail: dev-help@airflow.apache.org
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Proposal for adding Telemetry via Scarf

Posted by Kaxil Naik <ka...@gmail.com>.
Yup, I had added a link to scarf docs in the original email that referenced
opting out and we should even add an Airflow config that puts all config in
a single place. Without it we can’t be compliant to all the policies even
if we collectively ignore or are unaware of the importance of it.

Regarding the data, like I had mentioned in the email and I am glad others
including you are on the same page that the data will be shared with all
PMC members. The point about sharing it via website and newsletter was for
the community — Airflow users. I don’t think anyone in the community (apart
from the PMC members) would need raw data. And even if they need it, I’d
say they should put effort and contribute to the Airflow project and become
PMC members.

To be clear: this telemetry data should help us, as Airflow PMC, to steer
some of the decision making based on this data similar to how only PMC has
a binding vote on the releases. [1] and this is similar to how Apache
Superset does it too.

[1]
https://www.apache.org/dev/pmc.html#what-is-a-pmc



On Wed, 3 Apr 2024 at 00:05, Hussein Awala <hu...@awala.fr> wrote:

> I mentioned opting out just to confirm its importance, and after checking
> the Scarf documentation it appears to be supported natively by Scarf. For
> data accessibility, my point was more about raw data, not just aggregated
> information/insights shared via monthly newsletters, as we do for Airflow
> annual Survey for example:
> https://airflow.apache.org/survey vs
>
> https://docs.google.com/forms/d/1wYm6c5Gn379zkg7zD7vcWB-1fCjnOocT0oZm-tjft_Q/viewanalytics
> .
>
> On Tue, Apr 2, 2024 at 2:43 PM Kaxil Naik <ka...@gmail.com> wrote:
>
> > Agreed to both your points Hussein but both the points are already
> covered
> > in my original discussion post - both about opting out and providing data
> > to all the PMC members and providing visibility via Monthly newsletters.
> Is
> > there anything else you propose to discuss that isn’t covered?
> >
> >
> >
> > On Mon, 1 Apr 2024 at 13:21, Hussein Awala <hu...@awala.fr> wrote:
> >
> > > +1 for the idea in general, but there are two main points to discuss
> > before
> > > voting on this:
> > >
> > > 1. We should provide an option to disable Scarf:
> > > As Airflow is not a paid product, we cannot force companies to report
> > their
> > > use of this project. Otherwise, some may choose to create their own
> fork
> > > just to disable Scarf.
> > >
> > > 2. Concerning the exclusivity of access to data:
> > > The data collected must either be completely proprietary for use by PMC
> > and
> > > ASF, or completely open. Since many companies offer Airflow as a
> product,
> > > it is imperative not to give one company more privileges than others. I
> > > raise this point for the principle of equality of opportunity.
> > >
> > > On Mon, Apr 1, 2024 at 12:35 PM Ankit Chaurasia <su...@gmail.com>
> > > wrote:
> > >
> > > > Big +1 for Scarf.
> > > >
> > > > Transparency is key, so it's important to be super clear about opting
> > > > out and what's tracked to avoid spooking anyone about IP stuff.
> > > >
> > > > Regards
> > > > Ankit Chaurasia
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Apr 1, 2024 at 10:18 AM Amogh Desai <
> amoghdesai.oss@gmail.com>
> > > > wrote:
> > > > >
> > > > > +1 looks like a good tool which could be super helpful.
> > > > >
> > > > > * We should have some transparency into the data that is collected
> or
> > > > sent
> > > > > * We should have an option to optionally opt-out
> > > > >
> > > > > Thanks & Regards,
> > > > > Amogh Desai
> > > > >
> > > > >
> > > > > On Sun, Mar 31, 2024 at 7:53 AM Wei Lee <we...@gmail.com>
> wrote:
> > > > >
> > > > > > +1 to this. It would be really useful. As long as we can opt
> out, I
> > > > think
> > > > > > we’re good.
> > > > > >
> > > > > > Best,
> > > > > > Wei
> > > > > >
> > > > > > > On Mar 31, 2024, at 12:47 AM, Kaxil Naik <ka...@gmail.com>
> > > > wrote:
> > > > > > >
> > > > > > > Grammar Correction:
> > > > > > >
> > > > > > > We should assume that those who deploy and upgrade Airflow -
> > > actually
> > > > > > read
> > > > > > >> and take into account what is written in the release notes -
> > > > especially
> > > > > > if
> > > > > > >> they have security guys breathing their necks, similarly as we
> > > have
> > > > to
> > > > > > >> assume they follow CVE announcements about security issues
> > fixed.
> > > > If we
> > > > > > >> are very straightforward and out-going about the change,
> inform
> > > very
> > > > > > >> clearly how to opt-out, I don't see a big problem with
> opt-out.
> > > > > > >
> > > > > > >
> > > > > > > I couldn't agree more; even though we shouldn't collect any
> data
> > > that
> > > > > > > hamper security (and we should aim to do the same), most
> security
> > > > > > concerned
> > > > > > > folks don't just upgrade, and we can rely on them regarding
> > release
> > > > notes
> > > > > > > or announcements and we can make it very clear in our
> > announcements
> > > > too;
> > > > > > > and in our installation guides.
> > > > > > >
> > > > > > > On Sat, 30 Mar 2024 at 16:47, Kaxil Naik <ka...@gmail.com>
> > > > wrote:
> > > > > > >
> > > > > > >> Grammar crrection:
> > > > > > >>
> > > > > > >>
> > > > > > >> On Sat, 30 Mar 2024 at 16:43, Kaxil Naik <kaxilnaik@gmail.com
> >
> > > > wrote:
> > > > > > >>
> > > > > > >>> Have this at the end of the email too: but if folks don't
> read
> > > > until
> > > > > > the
> > > > > > >>> end and quoting Maxime from the use-case blog[1]:
> > > > > > >>>
> > > > > > >>> "I think people often ask ‘how do I contribute to open
> > source?’,
> > > > ‘I've
> > > > > > >>> got to get into the code’, or ‘ I’ve got to be an engineer.’
> > > > Actually,
> > > > > > the
> > > > > > >>> very simplest thing that you can do is just say, ‘my
> > organization
> > > > gets
> > > > > > real
> > > > > > >>> value from this piece of software.’ There are a bunch of ways
> > to
> > > > let
> > > > > > the
> > > > > > >>> people know about it – and now Scarf is there. If your
> > > > organization is
> > > > > > >>> getting a lot of value from a piece of open source software,
> > make
> > > > sure
> > > > > > the
> > > > > > >>> devs know about it."
> > > > > > >>>
> > > > > > >>> What kind of edge cases are you thinking about? I don't think
> > it
> > > > makes
> > > > > > >>> sense to have "opt-in" at all. As the goal is to collect data
> > for
> > > > most
> > > > > > >>> Airflow installations except for those that don't want to
> give
> > > > data,
> > > > > > then
> > > > > > >>> "opt-out" is the only way to maximize it. As long as we don't
> > > > collect
> > > > > > any
> > > > > > >>> PII data, this is in-compliance as well.
> > > > > > >>>
> > > > > > >>> Imagine someone learning Airflow, if they have to opt-in via
> a
> > > > config,
> > > > > > >>> they wouldn't even know or care about it, hence us losing
> most
> > of
> > > > the
> > > > > > data.
> > > > > > >>> I understand why some orgs & individuals may want to opt-out.
> > > > > > >>>
> > > > > > >>> Scarf Provides tracking pixels (essentially an HTML image
> tag)
> > > > that you
> > > > > > >>> can place in your website or product to track visitors to
> that
> > > > URL. If
> > > > > > >>> there were any concerns about Privacy, ASF wouldn't have
> > approved
> > > > it
> > > > > > at all.
> > > > > > >>>
> > > > > > >>> A few key details to note about the pixel:
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>   - No PII is tracked… Scarf does not capture/retain IP
> > > > information…
> > > > > > >>>   this information is discarded by the platform upon
> > > > > > processing/aggregating
> > > > > > >>>   - Scarf pixels respect the Do Not Track (DNT) settings of
> > > > browsers -
> > > > > > >>>   these users will not be tracked whatsoever.
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> All the ASF projects I had listed (whether they use Scarf
> > gateway
> > > > or
> > > > > > >>> Scarf pixel in product) are using opt-out.
> > > > > > >>>
> > > > > > >>> 1. Short opt-in period before opt-out. Test this feature with
> > > > users who
> > > > > > >>>> trust and if it works great - make it public. I think it's
> > wise
> > > to
> > > > > > handle
> > > > > > >>>> edge cases and configure collected data more accurately.
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> It would be a pixel in the webserver, should affect nothing
> at
> > > all
> > > > even
> > > > > > >>> in an air-gapped environment.
> > > > > > >>>
> > > > > > >>>> 2. It should not affect anything if access to the internet
> is
> > > > > > restricted
> > > > > > >>>> which is default for many companies.
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> 100% agreed on the below:
> > > > > > >>>
> > > > > > >>>> I think we have a very good blueprint to follow including at
> > > > least 5
> > > > > > >>>> other
> > > > > > >>>> ASF projects that also passed the review of the privacy@asf.
> > > And
> > > > > > while I
> > > > > > >>>> understand (and concur) the urge for opt-in by default
> coming
> > > from
> > > > > > >>>> consumer
> > > > > > >>>> market (where it makes perfect sense) Airflow is not a
> > consumer
> > > > > > >>>> software and is used in "corporate environment" which has a
> > > little
> > > > > > >>>> different expectations and broad assumption that the company
> > can
> > > > make
> > > > > > >>>> decisions on such telemetry on behalf of the employees using
> > it.
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> Couldn't agree more; even though there shouldn't we collect
> > > hamper
> > > > > > >>> security (and we should aim to do the same), most security
> > > > concerned
> > > > > > folks
> > > > > > >>> don't just
> > > > > > >>> upgrade, and we can rely on them regarding release notes or
> > > > > > announcements
> > > > > > >>> and we can make it very clear in our announcements too; and
> in
> > > our
> > > > > > >>> installation guides.
> > > > > > >>>
> > > > > > >>> We should assume that those who deploy and upgrade Airflow -
> > > > actually
> > > > > > read
> > > > > > >>>> and take into account what is written in the release notes -
> > > > > > especially
> > > > > > >>>> if
> > > > > > >>>> they have security guys breathing their necks, similarly as
> we
> > > > have to
> > > > > > >>>> assume they follow CVE announcements about security issues
> > > fixed.
> > > > If
> > > > > > we
> > > > > > >>>> are very straightforward and out-going about the change,
> > inform
> > > > very
> > > > > > >>>> clearly how to opt-out, I don't see a big problem with
> > opt-out.
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> To be clear, the collection of data, or at least the data we
> > > should
> > > > > > >>> gather here should help all the consumers without violating
> > > > anything
> > > > > > >>> regulations. I will quote Maxime's quote in the use-case doc
> > [1]
> > > > > > >>>
> > > > > > >>> "*Another Form of Contributing*
> > > > > > >>> “I think people often ask ‘how do I contribute to open
> > source?’,
> > > > ‘I've
> > > > > > >>> got to get into the code’, or ‘ I’ve got to be an engineer.’
> > > > Actually,
> > > > > > the
> > > > > > >>> very simplest thing that you can do is just say, ‘my
> > organization
> > > > gets
> > > > > > real
> > > > > > >>> value from this piece of software.’ There are a bunch of ways
> > to
> > > > let
> > > > > > the
> > > > > > >>> people know about it – and now Scarf is there. If your
> > > > organization is
> > > > > > >>> getting a lot of value from a piece of open source software,
> > make
> > > > sure
> > > > > > the
> > > > > > >>> devs know about it.”"
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> [1]
> > https://about.scarf.sh/post/scarf-case-study-apache-superset
> > > > > > >>>
> > > > > > >>> On Sat, 30 Mar 2024 at 14:02, Alexander Shorin <
> > > kxepal@apache.org>
> > > > > > wrote:
> > > > > > >>>
> > > > > > >>>> Hi Jarek!
> > > > > > >>>>
> > > > > > >>>> I understand the reasons for opt-out from a project view. I
> > just
> > > > > > suddenly
> > > > > > >>>> imagined the situation when an upgrade happens and here
> comes
> > > the
> > > > > > data to
> > > > > > >>>> some third party service - that's a view from a user side of
> > > some
> > > > big
> > > > > > >>>> company.
> > > > > > >>>>
> > > > > > >>>> There could be good alternatives to handle this:
> > > > > > >>>> 1. Short opt-in period before opt-out. Test this feature
> with
> > > > users
> > > > > > who
> > > > > > >>>> trust and if it works great - make it public. I think it's
> > wise
> > > to
> > > > > > handle
> > > > > > >>>> edge cases and configure collected data more accurately.
> > > > > > >>>> 2. Explicitly somehow warn about this feature to make this
> > > > feature not
> > > > > > >>>> get
> > > > > > >>>> unnoticed. Just to reduce possible frustration.
> > > > > > >>>>
> > > > > > >>>> Just a personal thoughts for discussion (:
> > > > > > >>>>
> > > > > > >>>> --
> > > > > > >>>> ,,,^..^,,,
> > > > > > >>>>
> > > > > > >>>> On Sat, Mar 30, 2024 at 4:36 PM Jarek Potiuk <
> > jarek@potiuk.com>
> > > > > > wrote:
> > > > > > >>>>
> > > > > > >>>>> Hello everyone,
> > > > > > >>>>>
> > > > > > >>>>> it has to be:
> > > > > > >>>>>
> > > > > > >>>>> 1. Opt-in by default to not trigger security guys about new
> > > > unplanned
> > > > > > >>>>>> activity after regular upgrade.
> > > > > > >>>>>>
> > > > > > >>>>>
> > > > > > >>>>> That's a very good point about security triggering
> Alexander,
> > > > but I
> > > > > > am
> > > > > > >>>> not
> > > > > > >>>>> so sure it means that we "have to" do opt-in. There are
> other
> > > > ways of
> > > > > > >>>>> communicating with the "deployment managers" who install
> and
> > > > upgrade
> > > > > > >>>>> airflow - i.e. release notes. blogs, social media of ours,
> > > slack
> > > > > > >>>>> announcements etc. We have plenty of channels we can use to
> > > > > > >>>> communicate the
> > > > > > >>>>> change.
> > > > > > >>>>>
> > > > > > >>>>> I think we have a very good blueprint to follow including
> at
> > > > least 5
> > > > > > >>>> other
> > > > > > >>>>> ASF projects that also passed the review of the
> privacy@asf.
> > > And
> > > > > > >>>> while I
> > > > > > >>>>> understand (and concur) the urge for opt-in by default
> coming
> > > > from
> > > > > > >>>> consumer
> > > > > > >>>>> market (where it makes perfect sense) Airflow is not a
> > consumer
> > > > > > >>>>> software and is used in "corporate environment" which has a
> > > > little
> > > > > > >>>>> different expectations and broad assumption that the
> company
> > > can
> > > > make
> > > > > > >>>>> decisions on such telemetry on behalf of the employees
> using
> > > it.
> > > > > > >>>>>
> > > > > > >>>>> We should assume that those who deploy and upgrade Airflow
> -
> > > > actually
> > > > > > >>>> read
> > > > > > >>>>> and take into account what is written in the release notes
> -
> > > > > > >>>> especially if
> > > > > > >>>>> they have security guys breathing their necks, similarly as
> > we
> > > > have
> > > > > > to
> > > > > > >>>>> assume they follow CVE announcements about security issues
> > > > fixed. If
> > > > > > we
> > > > > > >>>>> are very straightforward and out-going about the change,
> > inform
> > > > very
> > > > > > >>>>> clearly how to opt-out, I don't see a big problem with
> > opt-out.
> > > > > > >>>>>
> > > > > > >>>>> We should of course check with privacy@a.o (but I'v spend
> a
> > > good
> > > > > > deal
> > > > > > >>>> of
> > > > > > >>>>> time reading the Superset  and other use case and
> explanation
> > > in
> > > > > > >>>> detail to
> > > > > > >>>>> make a better informed decision) - and it looks like they
> > also
> > > > went
> > > > > > >>>> opt-out
> > > > > > >>>>> way and got cleared by privacy@a.o.  And if we cannot
> reach
> > > > > > >>>> consensus, we
> > > > > > >>>>> should - as usual - make a voting decision on it (because
> > yes,
> > > > it is
> > > > > > an
> > > > > > >>>>> important decision), but - after reading and understanding
> > why
> > > > others
> > > > > > >>>> also
> > > > > > >>>>> did it - for me personally, opt-out is a good path.
> > > > > > >>>>>
> > > > > > >>>>> Also because it will rather increase the amount of data to
> > > > gather,
> > > > > > and
> > > > > > >>>> in
> > > > > > >>>>> our case - counter intuitively - it will be even better for
> > > > privacy
> > > > > > and
> > > > > > >>>>> corporate anonymity, because the more data we get, the more
> > > > difficult
> > > > > > >>>> it
> > > > > > >>>>> will be to get any non-statistical/non-aggregated insight
> > from
> > > > it.
> > > > > > >>>> Imagine
> > > > > > >>>>> if only a few corporate users will enable it consciously -
> > then
> > > > we
> > > > > > >>>> will be
> > > > > > >>>>> able to draw much more conclusions if we find out who they
> > are,
> > > > than
> > > > > > if
> > > > > > >>>>> everyone has it enabled by default.
> > > > > > >>>>>
> > > > > > >>>>> That's my take on it - but again, it's up to us to vote,
> for
> > me
> > > > > > opt-in
> > > > > > >>>> is
> > > > > > >>>>> not "has to", and I am rather for opt-out.
> > > > > > >>>>>
> > > > > > >>>>> J.
> > > > > > >>>>>
> > > > > > >>>>>> Hi all,
> > > > > > >>>>>>
> > > > > > >>>>>>
> > > > > > >>>>>>> I want to propose gathering telemetry for Airflow
> > > > installations.
> > > > > > >>>> As the
> > > > > > >>>>>>> Airflow community, we have been relying heavily on the
> > yearly
> > > > > > >>>> Airflow
> > > > > > >>>>>>> Survey and anecdotes to answer a few key questions about
> > > > Airflow
> > > > > > >>>> usage.
> > > > > > >>>>>>> Questions like the following:
> > > > > > >>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>>   - Which versions of Airflow are people installing/using
> > now
> > > > > > >>>> (i.e.
> > > > > > >>>>>>>   whether people have primarily made the jump from
> version
> > X
> > > to
> > > > > > >>>>> version
> > > > > > >>>>>> Y)
> > > > > > >>>>>>>   - Which DB is used as the Metadata DB and which version
> > e.g
> > > > Pg
> > > > > > >>>> 14?
> > > > > > >>>>>>>   - What Python version is being used?
> > > > > > >>>>>>>   - Which Executor is being used?
> > > > > > >>>>>>>   - Approximately how many people out there in the world
> > are
> > > > > > >>>>> installing
> > > > > > >>>>>>>   Airflow
> > > > > > >>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>> There is a solution that should help answer these
> > questions:
> > > > Scarf
> > > > > > >>>> [1].
> > > > > > >>>>>> The
> > > > > > >>>>>>> ASF already approves Scarf [2][3] and is already used by
> > > other
> > > > ASF
> > > > > > >>>>>>> projects: Superset [4], Dolphin Scheduler [5], Dubbo
> > > > Kubernetes,
> > > > > > >>>>> DevLake,
> > > > > > >>>>>>> Skywalking as it follows GDPR and other regulations.
> > > > > > >>>>>>>
> > > > > > >>>>>>> Similar to Superset, we probably can use it as follows:
> > > > > > >>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>>   1. Install the `scarf js` npm package and bundle it in
> > the
> > > > > > >>>>> Webserver.
> > > > > > >>>>>>>   When the package is downloaded & Airflow webserver is
> > > opened,
> > > > > > >>>>> metadata
> > > > > > >>>>>>> is
> > > > > > >>>>>>>   recorded to the Scarf dashboard.
> > > > > > >>>>>>>   2. Utilize the Scarf Gateway [6], which we can use in
> > front
> > > > of
> > > > > > >>>>> docker
> > > > > > >>>>>>>   containers. While it’s possible people go around this
> > > > gateway,
> > > > > > >>>> we
> > > > > > >>>>> can
> > > > > > >>>>>>>   probably configure and encourage most traffic to go
> > through
> > > > > > >>>> these
> > > > > > >>>>>>> gateways.
> > > > > > >>>>>>>
> > > > > > >>>>>>> While Scarf does not store any personally identifying
> > > > information
> > > > > > >>>> from
> > > > > > >>>>>> SDK
> > > > > > >>>>>>> telemetry data, it does send various bits of IP-derived
> > > > > > >>>> information as
> > > > > > >>>>>>> outlined here [7]. This data should be made as
> transparent
> > as
> > > > > > >>>> possible
> > > > > > >>>>> by
> > > > > > >>>>>>> granting dashboard access to the Airflow PMC and any
> other
> > > > relevant
> > > > > > >>>>> means
> > > > > > >>>>>>> of sharing/surfacing it that we encounter (Town Hall,
> > Slack,
> > > > > > >>>> Newsletter
> > > > > > >>>>>>> etc).
> > > > > > >>>>>>>
> > > > > > >>>>>>> The following case studies are worth reading:
> > > > > > >>>>>>>
> > > > > > >>>>>>>   1.
> > > > https://about.scarf.sh/post/scarf-case-study-apache-superset
> > > > > > >>>>> (From
> > > > > > >>>>>>>   Maxime)
> > > > > > >>>>>>>   2.
> > > > > > >>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>
> > > > > > >>>>>
> > > > > > >>>>
> > > > > >
> > > >
> > >
> >
> https://about.scarf.sh/post/haskell-org-bridging-the-gap-between-language-innovation-and-community-understanding
> > > > > > >>>>>>>
> > > > > > >>>>>>> Similar to them, this could help in various ways that
> come
> > > with
> > > > > > >>>> using
> > > > > > >>>>>> data
> > > > > > >>>>>>> for decision-making. With clear guidelines on "how to
> > > opt-out"
> > > > > > >>>>>> [8][9][10] &
> > > > > > >>>>>>> "what data is being collected" on the Airflow website,
> this
> > > > can be
> > > > > > >>>>>>> beneficial to the entire community as we would be making
> > more
> > > > > > >>>> informed
> > > > > > >>>>>>> decisions.
> > > > > > >>>>>>>
> > > > > > >>>>>>> Regards,
> > > > > > >>>>>>> Kaxil
> > > > > > >>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>> [1] https://about.scarf.sh/
> > > > > > >>>>>>> [2]
> > > > https://privacy.apache.org/policies/privacy-policy-public.html
> > > > > > >>>>>>> [3] https://privacy.apache.org/faq/committers.html
> > > > > > >>>>>>> [4] https://github.com/apache/superset/issues/25639
> > > > > > >>>>>>> [5]
> > > > > > >>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>
> > > > > > >>>>>
> > > > > > >>>>
> > > > > >
> > > >
> > >
> >
> https://github.com/search?q=repo%3Aapache%2Fdolphinscheduler%20scarf.sh&type=code
> > > > > > >>>>>>> [6] https://about.scarf.sh/scarf-gateway
> > > > > > >>>>>>> [7] https://about.scarf.sh/privacy-policy
> > > > > > >>>>>>> [8]
> > > > > > >>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>
> > > > > > >>>>>
> > > > > > >>>>
> > > > > >
> > > >
> > >
> >
> https://superset.apache.org/docs/frequently-asked-questions/#does-superset-collect-any-telemetry-data
> > > > > > >>>>>>> [9]
> > > > > > >>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>
> > > > > > >>>>>
> > > > > > >>>>
> > > > > >
> > > >
> > >
> >
> https://superset.apache.org/docs/installation/installing-superset-using-docker-compose
> > > > > > >>>>>>> [10]
> > > > > > >>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>
> > > > > > >>>>>
> > > > > > >>>>
> > > > > >
> > > >
> > >
> >
> https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics
> > > > > > >>>>>>>
> > > > > > >>>>>>
> > > > > > >>>>>
> > > > > > >>>>
> > > > > > >>>
> > > > > >
> > > > > >
> > > > > >
> > ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail: dev-unsubscribe@airflow.apache.org
> > > > > > For additional commands, e-mail: dev-help@airflow.apache.org
> > > > > >
> > > > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscribe@airflow.apache.org
> > > > For additional commands, e-mail: dev-help@airflow.apache.org
> > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Proposal for adding Telemetry via Scarf

Posted by Hussein Awala <hu...@awala.fr>.
I mentioned opting out just to confirm its importance, and after checking
the Scarf documentation it appears to be supported natively by Scarf. For
data accessibility, my point was more about raw data, not just aggregated
information/insights shared via monthly newsletters, as we do for Airflow
annual Survey for example:
https://airflow.apache.org/survey vs
https://docs.google.com/forms/d/1wYm6c5Gn379zkg7zD7vcWB-1fCjnOocT0oZm-tjft_Q/viewanalytics
.

On Tue, Apr 2, 2024 at 2:43 PM Kaxil Naik <ka...@gmail.com> wrote:

> Agreed to both your points Hussein but both the points are already covered
> in my original discussion post - both about opting out and providing data
> to all the PMC members and providing visibility via Monthly newsletters. Is
> there anything else you propose to discuss that isn’t covered?
>
>
>
> On Mon, 1 Apr 2024 at 13:21, Hussein Awala <hu...@awala.fr> wrote:
>
> > +1 for the idea in general, but there are two main points to discuss
> before
> > voting on this:
> >
> > 1. We should provide an option to disable Scarf:
> > As Airflow is not a paid product, we cannot force companies to report
> their
> > use of this project. Otherwise, some may choose to create their own fork
> > just to disable Scarf.
> >
> > 2. Concerning the exclusivity of access to data:
> > The data collected must either be completely proprietary for use by PMC
> and
> > ASF, or completely open. Since many companies offer Airflow as a product,
> > it is imperative not to give one company more privileges than others. I
> > raise this point for the principle of equality of opportunity.
> >
> > On Mon, Apr 1, 2024 at 12:35 PM Ankit Chaurasia <su...@gmail.com>
> > wrote:
> >
> > > Big +1 for Scarf.
> > >
> > > Transparency is key, so it's important to be super clear about opting
> > > out and what's tracked to avoid spooking anyone about IP stuff.
> > >
> > > Regards
> > > Ankit Chaurasia
> > >
> > >
> > >
> > >
> > > On Mon, Apr 1, 2024 at 10:18 AM Amogh Desai <am...@gmail.com>
> > > wrote:
> > > >
> > > > +1 looks like a good tool which could be super helpful.
> > > >
> > > > * We should have some transparency into the data that is collected or
> > > sent
> > > > * We should have an option to optionally opt-out
> > > >
> > > > Thanks & Regards,
> > > > Amogh Desai
> > > >
> > > >
> > > > On Sun, Mar 31, 2024 at 7:53 AM Wei Lee <we...@gmail.com> wrote:
> > > >
> > > > > +1 to this. It would be really useful. As long as we can opt out, I
> > > think
> > > > > we’re good.
> > > > >
> > > > > Best,
> > > > > Wei
> > > > >
> > > > > > On Mar 31, 2024, at 12:47 AM, Kaxil Naik <ka...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > Grammar Correction:
> > > > > >
> > > > > > We should assume that those who deploy and upgrade Airflow -
> > actually
> > > > > read
> > > > > >> and take into account what is written in the release notes -
> > > especially
> > > > > if
> > > > > >> they have security guys breathing their necks, similarly as we
> > have
> > > to
> > > > > >> assume they follow CVE announcements about security issues
> fixed.
> > > If we
> > > > > >> are very straightforward and out-going about the change, inform
> > very
> > > > > >> clearly how to opt-out, I don't see a big problem with opt-out.
> > > > > >
> > > > > >
> > > > > > I couldn't agree more; even though we shouldn't collect any data
> > that
> > > > > > hamper security (and we should aim to do the same), most security
> > > > > concerned
> > > > > > folks don't just upgrade, and we can rely on them regarding
> release
> > > notes
> > > > > > or announcements and we can make it very clear in our
> announcements
> > > too;
> > > > > > and in our installation guides.
> > > > > >
> > > > > > On Sat, 30 Mar 2024 at 16:47, Kaxil Naik <ka...@gmail.com>
> > > wrote:
> > > > > >
> > > > > >> Grammar crrection:
> > > > > >>
> > > > > >>
> > > > > >> On Sat, 30 Mar 2024 at 16:43, Kaxil Naik <ka...@gmail.com>
> > > wrote:
> > > > > >>
> > > > > >>> Have this at the end of the email too: but if folks don't read
> > > until
> > > > > the
> > > > > >>> end and quoting Maxime from the use-case blog[1]:
> > > > > >>>
> > > > > >>> "I think people often ask ‘how do I contribute to open
> source?’,
> > > ‘I've
> > > > > >>> got to get into the code’, or ‘ I’ve got to be an engineer.’
> > > Actually,
> > > > > the
> > > > > >>> very simplest thing that you can do is just say, ‘my
> organization
> > > gets
> > > > > real
> > > > > >>> value from this piece of software.’ There are a bunch of ways
> to
> > > let
> > > > > the
> > > > > >>> people know about it – and now Scarf is there. If your
> > > organization is
> > > > > >>> getting a lot of value from a piece of open source software,
> make
> > > sure
> > > > > the
> > > > > >>> devs know about it."
> > > > > >>>
> > > > > >>> What kind of edge cases are you thinking about? I don't think
> it
> > > makes
> > > > > >>> sense to have "opt-in" at all. As the goal is to collect data
> for
> > > most
> > > > > >>> Airflow installations except for those that don't want to give
> > > data,
> > > > > then
> > > > > >>> "opt-out" is the only way to maximize it. As long as we don't
> > > collect
> > > > > any
> > > > > >>> PII data, this is in-compliance as well.
> > > > > >>>
> > > > > >>> Imagine someone learning Airflow, if they have to opt-in via a
> > > config,
> > > > > >>> they wouldn't even know or care about it, hence us losing most
> of
> > > the
> > > > > data.
> > > > > >>> I understand why some orgs & individuals may want to opt-out.
> > > > > >>>
> > > > > >>> Scarf Provides tracking pixels (essentially an HTML image tag)
> > > that you
> > > > > >>> can place in your website or product to track visitors to that
> > > URL. If
> > > > > >>> there were any concerns about Privacy, ASF wouldn't have
> approved
> > > it
> > > > > at all.
> > > > > >>>
> > > > > >>> A few key details to note about the pixel:
> > > > > >>>
> > > > > >>>
> > > > > >>>   - No PII is tracked… Scarf does not capture/retain IP
> > > information…
> > > > > >>>   this information is discarded by the platform upon
> > > > > processing/aggregating
> > > > > >>>   - Scarf pixels respect the Do Not Track (DNT) settings of
> > > browsers -
> > > > > >>>   these users will not be tracked whatsoever.
> > > > > >>>
> > > > > >>>
> > > > > >>> All the ASF projects I had listed (whether they use Scarf
> gateway
> > > or
> > > > > >>> Scarf pixel in product) are using opt-out.
> > > > > >>>
> > > > > >>> 1. Short opt-in period before opt-out. Test this feature with
> > > users who
> > > > > >>>> trust and if it works great - make it public. I think it's
> wise
> > to
> > > > > handle
> > > > > >>>> edge cases and configure collected data more accurately.
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>> It would be a pixel in the webserver, should affect nothing at
> > all
> > > even
> > > > > >>> in an air-gapped environment.
> > > > > >>>
> > > > > >>>> 2. It should not affect anything if access to the internet is
> > > > > restricted
> > > > > >>>> which is default for many companies.
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>> 100% agreed on the below:
> > > > > >>>
> > > > > >>>> I think we have a very good blueprint to follow including at
> > > least 5
> > > > > >>>> other
> > > > > >>>> ASF projects that also passed the review of the privacy@asf.
> > And
> > > > > while I
> > > > > >>>> understand (and concur) the urge for opt-in by default coming
> > from
> > > > > >>>> consumer
> > > > > >>>> market (where it makes perfect sense) Airflow is not a
> consumer
> > > > > >>>> software and is used in "corporate environment" which has a
> > little
> > > > > >>>> different expectations and broad assumption that the company
> can
> > > make
> > > > > >>>> decisions on such telemetry on behalf of the employees using
> it.
> > > > > >>>
> > > > > >>>
> > > > > >>> Couldn't agree more; even though there shouldn't we collect
> > hamper
> > > > > >>> security (and we should aim to do the same), most security
> > > concerned
> > > > > folks
> > > > > >>> don't just
> > > > > >>> upgrade, and we can rely on them regarding release notes or
> > > > > announcements
> > > > > >>> and we can make it very clear in our announcements too; and in
> > our
> > > > > >>> installation guides.
> > > > > >>>
> > > > > >>> We should assume that those who deploy and upgrade Airflow -
> > > actually
> > > > > read
> > > > > >>>> and take into account what is written in the release notes -
> > > > > especially
> > > > > >>>> if
> > > > > >>>> they have security guys breathing their necks, similarly as we
> > > have to
> > > > > >>>> assume they follow CVE announcements about security issues
> > fixed.
> > > If
> > > > > we
> > > > > >>>> are very straightforward and out-going about the change,
> inform
> > > very
> > > > > >>>> clearly how to opt-out, I don't see a big problem with
> opt-out.
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>> To be clear, the collection of data, or at least the data we
> > should
> > > > > >>> gather here should help all the consumers without violating
> > > anything
> > > > > >>> regulations. I will quote Maxime's quote in the use-case doc
> [1]
> > > > > >>>
> > > > > >>> "*Another Form of Contributing*
> > > > > >>> “I think people often ask ‘how do I contribute to open
> source?’,
> > > ‘I've
> > > > > >>> got to get into the code’, or ‘ I’ve got to be an engineer.’
> > > Actually,
> > > > > the
> > > > > >>> very simplest thing that you can do is just say, ‘my
> organization
> > > gets
> > > > > real
> > > > > >>> value from this piece of software.’ There are a bunch of ways
> to
> > > let
> > > > > the
> > > > > >>> people know about it – and now Scarf is there. If your
> > > organization is
> > > > > >>> getting a lot of value from a piece of open source software,
> make
> > > sure
> > > > > the
> > > > > >>> devs know about it.”"
> > > > > >>>
> > > > > >>>
> > > > > >>> [1]
> https://about.scarf.sh/post/scarf-case-study-apache-superset
> > > > > >>>
> > > > > >>> On Sat, 30 Mar 2024 at 14:02, Alexander Shorin <
> > kxepal@apache.org>
> > > > > wrote:
> > > > > >>>
> > > > > >>>> Hi Jarek!
> > > > > >>>>
> > > > > >>>> I understand the reasons for opt-out from a project view. I
> just
> > > > > suddenly
> > > > > >>>> imagined the situation when an upgrade happens and here comes
> > the
> > > > > data to
> > > > > >>>> some third party service - that's a view from a user side of
> > some
> > > big
> > > > > >>>> company.
> > > > > >>>>
> > > > > >>>> There could be good alternatives to handle this:
> > > > > >>>> 1. Short opt-in period before opt-out. Test this feature with
> > > users
> > > > > who
> > > > > >>>> trust and if it works great - make it public. I think it's
> wise
> > to
> > > > > handle
> > > > > >>>> edge cases and configure collected data more accurately.
> > > > > >>>> 2. Explicitly somehow warn about this feature to make this
> > > feature not
> > > > > >>>> get
> > > > > >>>> unnoticed. Just to reduce possible frustration.
> > > > > >>>>
> > > > > >>>> Just a personal thoughts for discussion (:
> > > > > >>>>
> > > > > >>>> --
> > > > > >>>> ,,,^..^,,,
> > > > > >>>>
> > > > > >>>> On Sat, Mar 30, 2024 at 4:36 PM Jarek Potiuk <
> jarek@potiuk.com>
> > > > > wrote:
> > > > > >>>>
> > > > > >>>>> Hello everyone,
> > > > > >>>>>
> > > > > >>>>> it has to be:
> > > > > >>>>>
> > > > > >>>>> 1. Opt-in by default to not trigger security guys about new
> > > unplanned
> > > > > >>>>>> activity after regular upgrade.
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>> That's a very good point about security triggering Alexander,
> > > but I
> > > > > am
> > > > > >>>> not
> > > > > >>>>> so sure it means that we "have to" do opt-in. There are other
> > > ways of
> > > > > >>>>> communicating with the "deployment managers" who install and
> > > upgrade
> > > > > >>>>> airflow - i.e. release notes. blogs, social media of ours,
> > slack
> > > > > >>>>> announcements etc. We have plenty of channels we can use to
> > > > > >>>> communicate the
> > > > > >>>>> change.
> > > > > >>>>>
> > > > > >>>>> I think we have a very good blueprint to follow including at
> > > least 5
> > > > > >>>> other
> > > > > >>>>> ASF projects that also passed the review of the privacy@asf.
> > And
> > > > > >>>> while I
> > > > > >>>>> understand (and concur) the urge for opt-in by default coming
> > > from
> > > > > >>>> consumer
> > > > > >>>>> market (where it makes perfect sense) Airflow is not a
> consumer
> > > > > >>>>> software and is used in "corporate environment" which has a
> > > little
> > > > > >>>>> different expectations and broad assumption that the company
> > can
> > > make
> > > > > >>>>> decisions on such telemetry on behalf of the employees using
> > it.
> > > > > >>>>>
> > > > > >>>>> We should assume that those who deploy and upgrade Airflow -
> > > actually
> > > > > >>>> read
> > > > > >>>>> and take into account what is written in the release notes -
> > > > > >>>> especially if
> > > > > >>>>> they have security guys breathing their necks, similarly as
> we
> > > have
> > > > > to
> > > > > >>>>> assume they follow CVE announcements about security issues
> > > fixed. If
> > > > > we
> > > > > >>>>> are very straightforward and out-going about the change,
> inform
> > > very
> > > > > >>>>> clearly how to opt-out, I don't see a big problem with
> opt-out.
> > > > > >>>>>
> > > > > >>>>> We should of course check with privacy@a.o (but I'v spend a
> > good
> > > > > deal
> > > > > >>>> of
> > > > > >>>>> time reading the Superset  and other use case and explanation
> > in
> > > > > >>>> detail to
> > > > > >>>>> make a better informed decision) - and it looks like they
> also
> > > went
> > > > > >>>> opt-out
> > > > > >>>>> way and got cleared by privacy@a.o.  And if we cannot reach
> > > > > >>>> consensus, we
> > > > > >>>>> should - as usual - make a voting decision on it (because
> yes,
> > > it is
> > > > > an
> > > > > >>>>> important decision), but - after reading and understanding
> why
> > > others
> > > > > >>>> also
> > > > > >>>>> did it - for me personally, opt-out is a good path.
> > > > > >>>>>
> > > > > >>>>> Also because it will rather increase the amount of data to
> > > gather,
> > > > > and
> > > > > >>>> in
> > > > > >>>>> our case - counter intuitively - it will be even better for
> > > privacy
> > > > > and
> > > > > >>>>> corporate anonymity, because the more data we get, the more
> > > difficult
> > > > > >>>> it
> > > > > >>>>> will be to get any non-statistical/non-aggregated insight
> from
> > > it.
> > > > > >>>> Imagine
> > > > > >>>>> if only a few corporate users will enable it consciously -
> then
> > > we
> > > > > >>>> will be
> > > > > >>>>> able to draw much more conclusions if we find out who they
> are,
> > > than
> > > > > if
> > > > > >>>>> everyone has it enabled by default.
> > > > > >>>>>
> > > > > >>>>> That's my take on it - but again, it's up to us to vote, for
> me
> > > > > opt-in
> > > > > >>>> is
> > > > > >>>>> not "has to", and I am rather for opt-out.
> > > > > >>>>>
> > > > > >>>>> J.
> > > > > >>>>>
> > > > > >>>>>> Hi all,
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>>> I want to propose gathering telemetry for Airflow
> > > installations.
> > > > > >>>> As the
> > > > > >>>>>>> Airflow community, we have been relying heavily on the
> yearly
> > > > > >>>> Airflow
> > > > > >>>>>>> Survey and anecdotes to answer a few key questions about
> > > Airflow
> > > > > >>>> usage.
> > > > > >>>>>>> Questions like the following:
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>>   - Which versions of Airflow are people installing/using
> now
> > > > > >>>> (i.e.
> > > > > >>>>>>>   whether people have primarily made the jump from version
> X
> > to
> > > > > >>>>> version
> > > > > >>>>>> Y)
> > > > > >>>>>>>   - Which DB is used as the Metadata DB and which version
> e.g
> > > Pg
> > > > > >>>> 14?
> > > > > >>>>>>>   - What Python version is being used?
> > > > > >>>>>>>   - Which Executor is being used?
> > > > > >>>>>>>   - Approximately how many people out there in the world
> are
> > > > > >>>>> installing
> > > > > >>>>>>>   Airflow
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> There is a solution that should help answer these
> questions:
> > > Scarf
> > > > > >>>> [1].
> > > > > >>>>>> The
> > > > > >>>>>>> ASF already approves Scarf [2][3] and is already used by
> > other
> > > ASF
> > > > > >>>>>>> projects: Superset [4], Dolphin Scheduler [5], Dubbo
> > > Kubernetes,
> > > > > >>>>> DevLake,
> > > > > >>>>>>> Skywalking as it follows GDPR and other regulations.
> > > > > >>>>>>>
> > > > > >>>>>>> Similar to Superset, we probably can use it as follows:
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>>   1. Install the `scarf js` npm package and bundle it in
> the
> > > > > >>>>> Webserver.
> > > > > >>>>>>>   When the package is downloaded & Airflow webserver is
> > opened,
> > > > > >>>>> metadata
> > > > > >>>>>>> is
> > > > > >>>>>>>   recorded to the Scarf dashboard.
> > > > > >>>>>>>   2. Utilize the Scarf Gateway [6], which we can use in
> front
> > > of
> > > > > >>>>> docker
> > > > > >>>>>>>   containers. While it’s possible people go around this
> > > gateway,
> > > > > >>>> we
> > > > > >>>>> can
> > > > > >>>>>>>   probably configure and encourage most traffic to go
> through
> > > > > >>>> these
> > > > > >>>>>>> gateways.
> > > > > >>>>>>>
> > > > > >>>>>>> While Scarf does not store any personally identifying
> > > information
> > > > > >>>> from
> > > > > >>>>>> SDK
> > > > > >>>>>>> telemetry data, it does send various bits of IP-derived
> > > > > >>>> information as
> > > > > >>>>>>> outlined here [7]. This data should be made as transparent
> as
> > > > > >>>> possible
> > > > > >>>>> by
> > > > > >>>>>>> granting dashboard access to the Airflow PMC and any other
> > > relevant
> > > > > >>>>> means
> > > > > >>>>>>> of sharing/surfacing it that we encounter (Town Hall,
> Slack,
> > > > > >>>> Newsletter
> > > > > >>>>>>> etc).
> > > > > >>>>>>>
> > > > > >>>>>>> The following case studies are worth reading:
> > > > > >>>>>>>
> > > > > >>>>>>>   1.
> > > https://about.scarf.sh/post/scarf-case-study-apache-superset
> > > > > >>>>> (From
> > > > > >>>>>>>   Maxime)
> > > > > >>>>>>>   2.
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > >
> > >
> >
> https://about.scarf.sh/post/haskell-org-bridging-the-gap-between-language-innovation-and-community-understanding
> > > > > >>>>>>>
> > > > > >>>>>>> Similar to them, this could help in various ways that come
> > with
> > > > > >>>> using
> > > > > >>>>>> data
> > > > > >>>>>>> for decision-making. With clear guidelines on "how to
> > opt-out"
> > > > > >>>>>> [8][9][10] &
> > > > > >>>>>>> "what data is being collected" on the Airflow website, this
> > > can be
> > > > > >>>>>>> beneficial to the entire community as we would be making
> more
> > > > > >>>> informed
> > > > > >>>>>>> decisions.
> > > > > >>>>>>>
> > > > > >>>>>>> Regards,
> > > > > >>>>>>> Kaxil
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> [1] https://about.scarf.sh/
> > > > > >>>>>>> [2]
> > > https://privacy.apache.org/policies/privacy-policy-public.html
> > > > > >>>>>>> [3] https://privacy.apache.org/faq/committers.html
> > > > > >>>>>>> [4] https://github.com/apache/superset/issues/25639
> > > > > >>>>>>> [5]
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > >
> > >
> >
> https://github.com/search?q=repo%3Aapache%2Fdolphinscheduler%20scarf.sh&type=code
> > > > > >>>>>>> [6] https://about.scarf.sh/scarf-gateway
> > > > > >>>>>>> [7] https://about.scarf.sh/privacy-policy
> > > > > >>>>>>> [8]
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > >
> > >
> >
> https://superset.apache.org/docs/frequently-asked-questions/#does-superset-collect-any-telemetry-data
> > > > > >>>>>>> [9]
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > >
> > >
> >
> https://superset.apache.org/docs/installation/installing-superset-using-docker-compose
> > > > > >>>>>>> [10]
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > >
> > >
> >
> https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > >
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: dev-unsubscribe@airflow.apache.org
> > > > > For additional commands, e-mail: dev-help@airflow.apache.org
> > > > >
> > > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@airflow.apache.org
> > > For additional commands, e-mail: dev-help@airflow.apache.org
> > >
> > >
> >
>

Re: [DISCUSS] Proposal for adding Telemetry via Scarf

Posted by Kaxil Naik <ka...@gmail.com>.
Agreed to both your points Hussein but both the points are already covered
in my original discussion post - both about opting out and providing data
to all the PMC members and providing visibility via Monthly newsletters. Is
there anything else you propose to discuss that isn’t covered?



On Mon, 1 Apr 2024 at 13:21, Hussein Awala <hu...@awala.fr> wrote:

> +1 for the idea in general, but there are two main points to discuss before
> voting on this:
>
> 1. We should provide an option to disable Scarf:
> As Airflow is not a paid product, we cannot force companies to report their
> use of this project. Otherwise, some may choose to create their own fork
> just to disable Scarf.
>
> 2. Concerning the exclusivity of access to data:
> The data collected must either be completely proprietary for use by PMC and
> ASF, or completely open. Since many companies offer Airflow as a product,
> it is imperative not to give one company more privileges than others. I
> raise this point for the principle of equality of opportunity.
>
> On Mon, Apr 1, 2024 at 12:35 PM Ankit Chaurasia <su...@gmail.com>
> wrote:
>
> > Big +1 for Scarf.
> >
> > Transparency is key, so it's important to be super clear about opting
> > out and what's tracked to avoid spooking anyone about IP stuff.
> >
> > Regards
> > Ankit Chaurasia
> >
> >
> >
> >
> > On Mon, Apr 1, 2024 at 10:18 AM Amogh Desai <am...@gmail.com>
> > wrote:
> > >
> > > +1 looks like a good tool which could be super helpful.
> > >
> > > * We should have some transparency into the data that is collected or
> > sent
> > > * We should have an option to optionally opt-out
> > >
> > > Thanks & Regards,
> > > Amogh Desai
> > >
> > >
> > > On Sun, Mar 31, 2024 at 7:53 AM Wei Lee <we...@gmail.com> wrote:
> > >
> > > > +1 to this. It would be really useful. As long as we can opt out, I
> > think
> > > > we’re good.
> > > >
> > > > Best,
> > > > Wei
> > > >
> > > > > On Mar 31, 2024, at 12:47 AM, Kaxil Naik <ka...@gmail.com>
> > wrote:
> > > > >
> > > > > Grammar Correction:
> > > > >
> > > > > We should assume that those who deploy and upgrade Airflow -
> actually
> > > > read
> > > > >> and take into account what is written in the release notes -
> > especially
> > > > if
> > > > >> they have security guys breathing their necks, similarly as we
> have
> > to
> > > > >> assume they follow CVE announcements about security issues fixed.
> > If we
> > > > >> are very straightforward and out-going about the change, inform
> very
> > > > >> clearly how to opt-out, I don't see a big problem with opt-out.
> > > > >
> > > > >
> > > > > I couldn't agree more; even though we shouldn't collect any data
> that
> > > > > hamper security (and we should aim to do the same), most security
> > > > concerned
> > > > > folks don't just upgrade, and we can rely on them regarding release
> > notes
> > > > > or announcements and we can make it very clear in our announcements
> > too;
> > > > > and in our installation guides.
> > > > >
> > > > > On Sat, 30 Mar 2024 at 16:47, Kaxil Naik <ka...@gmail.com>
> > wrote:
> > > > >
> > > > >> Grammar crrection:
> > > > >>
> > > > >>
> > > > >> On Sat, 30 Mar 2024 at 16:43, Kaxil Naik <ka...@gmail.com>
> > wrote:
> > > > >>
> > > > >>> Have this at the end of the email too: but if folks don't read
> > until
> > > > the
> > > > >>> end and quoting Maxime from the use-case blog[1]:
> > > > >>>
> > > > >>> "I think people often ask ‘how do I contribute to open source?’,
> > ‘I've
> > > > >>> got to get into the code’, or ‘ I’ve got to be an engineer.’
> > Actually,
> > > > the
> > > > >>> very simplest thing that you can do is just say, ‘my organization
> > gets
> > > > real
> > > > >>> value from this piece of software.’ There are a bunch of ways to
> > let
> > > > the
> > > > >>> people know about it – and now Scarf is there. If your
> > organization is
> > > > >>> getting a lot of value from a piece of open source software, make
> > sure
> > > > the
> > > > >>> devs know about it."
> > > > >>>
> > > > >>> What kind of edge cases are you thinking about? I don't think it
> > makes
> > > > >>> sense to have "opt-in" at all. As the goal is to collect data for
> > most
> > > > >>> Airflow installations except for those that don't want to give
> > data,
> > > > then
> > > > >>> "opt-out" is the only way to maximize it. As long as we don't
> > collect
> > > > any
> > > > >>> PII data, this is in-compliance as well.
> > > > >>>
> > > > >>> Imagine someone learning Airflow, if they have to opt-in via a
> > config,
> > > > >>> they wouldn't even know or care about it, hence us losing most of
> > the
> > > > data.
> > > > >>> I understand why some orgs & individuals may want to opt-out.
> > > > >>>
> > > > >>> Scarf Provides tracking pixels (essentially an HTML image tag)
> > that you
> > > > >>> can place in your website or product to track visitors to that
> > URL. If
> > > > >>> there were any concerns about Privacy, ASF wouldn't have approved
> > it
> > > > at all.
> > > > >>>
> > > > >>> A few key details to note about the pixel:
> > > > >>>
> > > > >>>
> > > > >>>   - No PII is tracked… Scarf does not capture/retain IP
> > information…
> > > > >>>   this information is discarded by the platform upon
> > > > processing/aggregating
> > > > >>>   - Scarf pixels respect the Do Not Track (DNT) settings of
> > browsers -
> > > > >>>   these users will not be tracked whatsoever.
> > > > >>>
> > > > >>>
> > > > >>> All the ASF projects I had listed (whether they use Scarf gateway
> > or
> > > > >>> Scarf pixel in product) are using opt-out.
> > > > >>>
> > > > >>> 1. Short opt-in period before opt-out. Test this feature with
> > users who
> > > > >>>> trust and if it works great - make it public. I think it's wise
> to
> > > > handle
> > > > >>>> edge cases and configure collected data more accurately.
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> It would be a pixel in the webserver, should affect nothing at
> all
> > even
> > > > >>> in an air-gapped environment.
> > > > >>>
> > > > >>>> 2. It should not affect anything if access to the internet is
> > > > restricted
> > > > >>>> which is default for many companies.
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> 100% agreed on the below:
> > > > >>>
> > > > >>>> I think we have a very good blueprint to follow including at
> > least 5
> > > > >>>> other
> > > > >>>> ASF projects that also passed the review of the privacy@asf.
> And
> > > > while I
> > > > >>>> understand (and concur) the urge for opt-in by default coming
> from
> > > > >>>> consumer
> > > > >>>> market (where it makes perfect sense) Airflow is not a consumer
> > > > >>>> software and is used in "corporate environment" which has a
> little
> > > > >>>> different expectations and broad assumption that the company can
> > make
> > > > >>>> decisions on such telemetry on behalf of the employees using it.
> > > > >>>
> > > > >>>
> > > > >>> Couldn't agree more; even though there shouldn't we collect
> hamper
> > > > >>> security (and we should aim to do the same), most security
> > concerned
> > > > folks
> > > > >>> don't just
> > > > >>> upgrade, and we can rely on them regarding release notes or
> > > > announcements
> > > > >>> and we can make it very clear in our announcements too; and in
> our
> > > > >>> installation guides.
> > > > >>>
> > > > >>> We should assume that those who deploy and upgrade Airflow -
> > actually
> > > > read
> > > > >>>> and take into account what is written in the release notes -
> > > > especially
> > > > >>>> if
> > > > >>>> they have security guys breathing their necks, similarly as we
> > have to
> > > > >>>> assume they follow CVE announcements about security issues
> fixed.
> > If
> > > > we
> > > > >>>> are very straightforward and out-going about the change, inform
> > very
> > > > >>>> clearly how to opt-out, I don't see a big problem with opt-out.
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> To be clear, the collection of data, or at least the data we
> should
> > > > >>> gather here should help all the consumers without violating
> > anything
> > > > >>> regulations. I will quote Maxime's quote in the use-case doc [1]
> > > > >>>
> > > > >>> "*Another Form of Contributing*
> > > > >>> “I think people often ask ‘how do I contribute to open source?’,
> > ‘I've
> > > > >>> got to get into the code’, or ‘ I’ve got to be an engineer.’
> > Actually,
> > > > the
> > > > >>> very simplest thing that you can do is just say, ‘my organization
> > gets
> > > > real
> > > > >>> value from this piece of software.’ There are a bunch of ways to
> > let
> > > > the
> > > > >>> people know about it – and now Scarf is there. If your
> > organization is
> > > > >>> getting a lot of value from a piece of open source software, make
> > sure
> > > > the
> > > > >>> devs know about it.”"
> > > > >>>
> > > > >>>
> > > > >>> [1] https://about.scarf.sh/post/scarf-case-study-apache-superset
> > > > >>>
> > > > >>> On Sat, 30 Mar 2024 at 14:02, Alexander Shorin <
> kxepal@apache.org>
> > > > wrote:
> > > > >>>
> > > > >>>> Hi Jarek!
> > > > >>>>
> > > > >>>> I understand the reasons for opt-out from a project view. I just
> > > > suddenly
> > > > >>>> imagined the situation when an upgrade happens and here comes
> the
> > > > data to
> > > > >>>> some third party service - that's a view from a user side of
> some
> > big
> > > > >>>> company.
> > > > >>>>
> > > > >>>> There could be good alternatives to handle this:
> > > > >>>> 1. Short opt-in period before opt-out. Test this feature with
> > users
> > > > who
> > > > >>>> trust and if it works great - make it public. I think it's wise
> to
> > > > handle
> > > > >>>> edge cases and configure collected data more accurately.
> > > > >>>> 2. Explicitly somehow warn about this feature to make this
> > feature not
> > > > >>>> get
> > > > >>>> unnoticed. Just to reduce possible frustration.
> > > > >>>>
> > > > >>>> Just a personal thoughts for discussion (:
> > > > >>>>
> > > > >>>> --
> > > > >>>> ,,,^..^,,,
> > > > >>>>
> > > > >>>> On Sat, Mar 30, 2024 at 4:36 PM Jarek Potiuk <ja...@potiuk.com>
> > > > wrote:
> > > > >>>>
> > > > >>>>> Hello everyone,
> > > > >>>>>
> > > > >>>>> it has to be:
> > > > >>>>>
> > > > >>>>> 1. Opt-in by default to not trigger security guys about new
> > unplanned
> > > > >>>>>> activity after regular upgrade.
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>> That's a very good point about security triggering Alexander,
> > but I
> > > > am
> > > > >>>> not
> > > > >>>>> so sure it means that we "have to" do opt-in. There are other
> > ways of
> > > > >>>>> communicating with the "deployment managers" who install and
> > upgrade
> > > > >>>>> airflow - i.e. release notes. blogs, social media of ours,
> slack
> > > > >>>>> announcements etc. We have plenty of channels we can use to
> > > > >>>> communicate the
> > > > >>>>> change.
> > > > >>>>>
> > > > >>>>> I think we have a very good blueprint to follow including at
> > least 5
> > > > >>>> other
> > > > >>>>> ASF projects that also passed the review of the privacy@asf.
> And
> > > > >>>> while I
> > > > >>>>> understand (and concur) the urge for opt-in by default coming
> > from
> > > > >>>> consumer
> > > > >>>>> market (where it makes perfect sense) Airflow is not a consumer
> > > > >>>>> software and is used in "corporate environment" which has a
> > little
> > > > >>>>> different expectations and broad assumption that the company
> can
> > make
> > > > >>>>> decisions on such telemetry on behalf of the employees using
> it.
> > > > >>>>>
> > > > >>>>> We should assume that those who deploy and upgrade Airflow -
> > actually
> > > > >>>> read
> > > > >>>>> and take into account what is written in the release notes -
> > > > >>>> especially if
> > > > >>>>> they have security guys breathing their necks, similarly as we
> > have
> > > > to
> > > > >>>>> assume they follow CVE announcements about security issues
> > fixed. If
> > > > we
> > > > >>>>> are very straightforward and out-going about the change, inform
> > very
> > > > >>>>> clearly how to opt-out, I don't see a big problem with opt-out.
> > > > >>>>>
> > > > >>>>> We should of course check with privacy@a.o (but I'v spend a
> good
> > > > deal
> > > > >>>> of
> > > > >>>>> time reading the Superset  and other use case and explanation
> in
> > > > >>>> detail to
> > > > >>>>> make a better informed decision) - and it looks like they also
> > went
> > > > >>>> opt-out
> > > > >>>>> way and got cleared by privacy@a.o.  And if we cannot reach
> > > > >>>> consensus, we
> > > > >>>>> should - as usual - make a voting decision on it (because yes,
> > it is
> > > > an
> > > > >>>>> important decision), but - after reading and understanding why
> > others
> > > > >>>> also
> > > > >>>>> did it - for me personally, opt-out is a good path.
> > > > >>>>>
> > > > >>>>> Also because it will rather increase the amount of data to
> > gather,
> > > > and
> > > > >>>> in
> > > > >>>>> our case - counter intuitively - it will be even better for
> > privacy
> > > > and
> > > > >>>>> corporate anonymity, because the more data we get, the more
> > difficult
> > > > >>>> it
> > > > >>>>> will be to get any non-statistical/non-aggregated insight from
> > it.
> > > > >>>> Imagine
> > > > >>>>> if only a few corporate users will enable it consciously - then
> > we
> > > > >>>> will be
> > > > >>>>> able to draw much more conclusions if we find out who they are,
> > than
> > > > if
> > > > >>>>> everyone has it enabled by default.
> > > > >>>>>
> > > > >>>>> That's my take on it - but again, it's up to us to vote, for me
> > > > opt-in
> > > > >>>> is
> > > > >>>>> not "has to", and I am rather for opt-out.
> > > > >>>>>
> > > > >>>>> J.
> > > > >>>>>
> > > > >>>>>> Hi all,
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>> I want to propose gathering telemetry for Airflow
> > installations.
> > > > >>>> As the
> > > > >>>>>>> Airflow community, we have been relying heavily on the yearly
> > > > >>>> Airflow
> > > > >>>>>>> Survey and anecdotes to answer a few key questions about
> > Airflow
> > > > >>>> usage.
> > > > >>>>>>> Questions like the following:
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>   - Which versions of Airflow are people installing/using now
> > > > >>>> (i.e.
> > > > >>>>>>>   whether people have primarily made the jump from version X
> to
> > > > >>>>> version
> > > > >>>>>> Y)
> > > > >>>>>>>   - Which DB is used as the Metadata DB and which version e.g
> > Pg
> > > > >>>> 14?
> > > > >>>>>>>   - What Python version is being used?
> > > > >>>>>>>   - Which Executor is being used?
> > > > >>>>>>>   - Approximately how many people out there in the world are
> > > > >>>>> installing
> > > > >>>>>>>   Airflow
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> There is a solution that should help answer these questions:
> > Scarf
> > > > >>>> [1].
> > > > >>>>>> The
> > > > >>>>>>> ASF already approves Scarf [2][3] and is already used by
> other
> > ASF
> > > > >>>>>>> projects: Superset [4], Dolphin Scheduler [5], Dubbo
> > Kubernetes,
> > > > >>>>> DevLake,
> > > > >>>>>>> Skywalking as it follows GDPR and other regulations.
> > > > >>>>>>>
> > > > >>>>>>> Similar to Superset, we probably can use it as follows:
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>   1. Install the `scarf js` npm package and bundle it in the
> > > > >>>>> Webserver.
> > > > >>>>>>>   When the package is downloaded & Airflow webserver is
> opened,
> > > > >>>>> metadata
> > > > >>>>>>> is
> > > > >>>>>>>   recorded to the Scarf dashboard.
> > > > >>>>>>>   2. Utilize the Scarf Gateway [6], which we can use in front
> > of
> > > > >>>>> docker
> > > > >>>>>>>   containers. While it’s possible people go around this
> > gateway,
> > > > >>>> we
> > > > >>>>> can
> > > > >>>>>>>   probably configure and encourage most traffic to go through
> > > > >>>> these
> > > > >>>>>>> gateways.
> > > > >>>>>>>
> > > > >>>>>>> While Scarf does not store any personally identifying
> > information
> > > > >>>> from
> > > > >>>>>> SDK
> > > > >>>>>>> telemetry data, it does send various bits of IP-derived
> > > > >>>> information as
> > > > >>>>>>> outlined here [7]. This data should be made as transparent as
> > > > >>>> possible
> > > > >>>>> by
> > > > >>>>>>> granting dashboard access to the Airflow PMC and any other
> > relevant
> > > > >>>>> means
> > > > >>>>>>> of sharing/surfacing it that we encounter (Town Hall, Slack,
> > > > >>>> Newsletter
> > > > >>>>>>> etc).
> > > > >>>>>>>
> > > > >>>>>>> The following case studies are worth reading:
> > > > >>>>>>>
> > > > >>>>>>>   1.
> > https://about.scarf.sh/post/scarf-case-study-apache-superset
> > > > >>>>> (From
> > > > >>>>>>>   Maxime)
> > > > >>>>>>>   2.
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > >
> >
> https://about.scarf.sh/post/haskell-org-bridging-the-gap-between-language-innovation-and-community-understanding
> > > > >>>>>>>
> > > > >>>>>>> Similar to them, this could help in various ways that come
> with
> > > > >>>> using
> > > > >>>>>> data
> > > > >>>>>>> for decision-making. With clear guidelines on "how to
> opt-out"
> > > > >>>>>> [8][9][10] &
> > > > >>>>>>> "what data is being collected" on the Airflow website, this
> > can be
> > > > >>>>>>> beneficial to the entire community as we would be making more
> > > > >>>> informed
> > > > >>>>>>> decisions.
> > > > >>>>>>>
> > > > >>>>>>> Regards,
> > > > >>>>>>> Kaxil
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> [1] https://about.scarf.sh/
> > > > >>>>>>> [2]
> > https://privacy.apache.org/policies/privacy-policy-public.html
> > > > >>>>>>> [3] https://privacy.apache.org/faq/committers.html
> > > > >>>>>>> [4] https://github.com/apache/superset/issues/25639
> > > > >>>>>>> [5]
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > >
> >
> https://github.com/search?q=repo%3Aapache%2Fdolphinscheduler%20scarf.sh&type=code
> > > > >>>>>>> [6] https://about.scarf.sh/scarf-gateway
> > > > >>>>>>> [7] https://about.scarf.sh/privacy-policy
> > > > >>>>>>> [8]
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > >
> >
> https://superset.apache.org/docs/frequently-asked-questions/#does-superset-collect-any-telemetry-data
> > > > >>>>>>> [9]
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > >
> >
> https://superset.apache.org/docs/installation/installing-superset-using-docker-compose
> > > > >>>>>>> [10]
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > >
> >
> https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>>
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscribe@airflow.apache.org
> > > > For additional commands, e-mail: dev-help@airflow.apache.org
> > > >
> > > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@airflow.apache.org
> > For additional commands, e-mail: dev-help@airflow.apache.org
> >
> >
>

Re: [DISCUSS] Proposal for adding Telemetry via Scarf

Posted by Hussein Awala <hu...@awala.fr>.
+1 for the idea in general, but there are two main points to discuss before
voting on this:

1. We should provide an option to disable Scarf:
As Airflow is not a paid product, we cannot force companies to report their
use of this project. Otherwise, some may choose to create their own fork
just to disable Scarf.

2. Concerning the exclusivity of access to data:
The data collected must either be completely proprietary for use by PMC and
ASF, or completely open. Since many companies offer Airflow as a product,
it is imperative not to give one company more privileges than others. I
raise this point for the principle of equality of opportunity.

On Mon, Apr 1, 2024 at 12:35 PM Ankit Chaurasia <su...@gmail.com> wrote:

> Big +1 for Scarf.
>
> Transparency is key, so it's important to be super clear about opting
> out and what's tracked to avoid spooking anyone about IP stuff.
>
> Regards
> Ankit Chaurasia
>
>
>
>
> On Mon, Apr 1, 2024 at 10:18 AM Amogh Desai <am...@gmail.com>
> wrote:
> >
> > +1 looks like a good tool which could be super helpful.
> >
> > * We should have some transparency into the data that is collected or
> sent
> > * We should have an option to optionally opt-out
> >
> > Thanks & Regards,
> > Amogh Desai
> >
> >
> > On Sun, Mar 31, 2024 at 7:53 AM Wei Lee <we...@gmail.com> wrote:
> >
> > > +1 to this. It would be really useful. As long as we can opt out, I
> think
> > > we’re good.
> > >
> > > Best,
> > > Wei
> > >
> > > > On Mar 31, 2024, at 12:47 AM, Kaxil Naik <ka...@gmail.com>
> wrote:
> > > >
> > > > Grammar Correction:
> > > >
> > > > We should assume that those who deploy and upgrade Airflow - actually
> > > read
> > > >> and take into account what is written in the release notes -
> especially
> > > if
> > > >> they have security guys breathing their necks, similarly as we have
> to
> > > >> assume they follow CVE announcements about security issues fixed.
> If we
> > > >> are very straightforward and out-going about the change, inform very
> > > >> clearly how to opt-out, I don't see a big problem with opt-out.
> > > >
> > > >
> > > > I couldn't agree more; even though we shouldn't collect any data that
> > > > hamper security (and we should aim to do the same), most security
> > > concerned
> > > > folks don't just upgrade, and we can rely on them regarding release
> notes
> > > > or announcements and we can make it very clear in our announcements
> too;
> > > > and in our installation guides.
> > > >
> > > > On Sat, 30 Mar 2024 at 16:47, Kaxil Naik <ka...@gmail.com>
> wrote:
> > > >
> > > >> Grammar crrection:
> > > >>
> > > >>
> > > >> On Sat, 30 Mar 2024 at 16:43, Kaxil Naik <ka...@gmail.com>
> wrote:
> > > >>
> > > >>> Have this at the end of the email too: but if folks don't read
> until
> > > the
> > > >>> end and quoting Maxime from the use-case blog[1]:
> > > >>>
> > > >>> "I think people often ask ‘how do I contribute to open source?’,
> ‘I've
> > > >>> got to get into the code’, or ‘ I’ve got to be an engineer.’
> Actually,
> > > the
> > > >>> very simplest thing that you can do is just say, ‘my organization
> gets
> > > real
> > > >>> value from this piece of software.’ There are a bunch of ways to
> let
> > > the
> > > >>> people know about it – and now Scarf is there. If your
> organization is
> > > >>> getting a lot of value from a piece of open source software, make
> sure
> > > the
> > > >>> devs know about it."
> > > >>>
> > > >>> What kind of edge cases are you thinking about? I don't think it
> makes
> > > >>> sense to have "opt-in" at all. As the goal is to collect data for
> most
> > > >>> Airflow installations except for those that don't want to give
> data,
> > > then
> > > >>> "opt-out" is the only way to maximize it. As long as we don't
> collect
> > > any
> > > >>> PII data, this is in-compliance as well.
> > > >>>
> > > >>> Imagine someone learning Airflow, if they have to opt-in via a
> config,
> > > >>> they wouldn't even know or care about it, hence us losing most of
> the
> > > data.
> > > >>> I understand why some orgs & individuals may want to opt-out.
> > > >>>
> > > >>> Scarf Provides tracking pixels (essentially an HTML image tag)
> that you
> > > >>> can place in your website or product to track visitors to that
> URL. If
> > > >>> there were any concerns about Privacy, ASF wouldn't have approved
> it
> > > at all.
> > > >>>
> > > >>> A few key details to note about the pixel:
> > > >>>
> > > >>>
> > > >>>   - No PII is tracked… Scarf does not capture/retain IP
> information…
> > > >>>   this information is discarded by the platform upon
> > > processing/aggregating
> > > >>>   - Scarf pixels respect the Do Not Track (DNT) settings of
> browsers -
> > > >>>   these users will not be tracked whatsoever.
> > > >>>
> > > >>>
> > > >>> All the ASF projects I had listed (whether they use Scarf gateway
> or
> > > >>> Scarf pixel in product) are using opt-out.
> > > >>>
> > > >>> 1. Short opt-in period before opt-out. Test this feature with
> users who
> > > >>>> trust and if it works great - make it public. I think it's wise to
> > > handle
> > > >>>> edge cases and configure collected data more accurately.
> > > >>>
> > > >>>
> > > >>>
> > > >>> It would be a pixel in the webserver, should affect nothing at all
> even
> > > >>> in an air-gapped environment.
> > > >>>
> > > >>>> 2. It should not affect anything if access to the internet is
> > > restricted
> > > >>>> which is default for many companies.
> > > >>>
> > > >>>
> > > >>>
> > > >>> 100% agreed on the below:
> > > >>>
> > > >>>> I think we have a very good blueprint to follow including at
> least 5
> > > >>>> other
> > > >>>> ASF projects that also passed the review of the privacy@asf. And
> > > while I
> > > >>>> understand (and concur) the urge for opt-in by default coming from
> > > >>>> consumer
> > > >>>> market (where it makes perfect sense) Airflow is not a consumer
> > > >>>> software and is used in "corporate environment" which has a little
> > > >>>> different expectations and broad assumption that the company can
> make
> > > >>>> decisions on such telemetry on behalf of the employees using it.
> > > >>>
> > > >>>
> > > >>> Couldn't agree more; even though there shouldn't we collect hamper
> > > >>> security (and we should aim to do the same), most security
> concerned
> > > folks
> > > >>> don't just
> > > >>> upgrade, and we can rely on them regarding release notes or
> > > announcements
> > > >>> and we can make it very clear in our announcements too; and in our
> > > >>> installation guides.
> > > >>>
> > > >>> We should assume that those who deploy and upgrade Airflow -
> actually
> > > read
> > > >>>> and take into account what is written in the release notes -
> > > especially
> > > >>>> if
> > > >>>> they have security guys breathing their necks, similarly as we
> have to
> > > >>>> assume they follow CVE announcements about security issues fixed.
> If
> > > we
> > > >>>> are very straightforward and out-going about the change, inform
> very
> > > >>>> clearly how to opt-out, I don't see a big problem with opt-out.
> > > >>>
> > > >>>
> > > >>>
> > > >>> To be clear, the collection of data, or at least the data we should
> > > >>> gather here should help all the consumers without violating
> anything
> > > >>> regulations. I will quote Maxime's quote in the use-case doc [1]
> > > >>>
> > > >>> "*Another Form of Contributing*
> > > >>> “I think people often ask ‘how do I contribute to open source?’,
> ‘I've
> > > >>> got to get into the code’, or ‘ I’ve got to be an engineer.’
> Actually,
> > > the
> > > >>> very simplest thing that you can do is just say, ‘my organization
> gets
> > > real
> > > >>> value from this piece of software.’ There are a bunch of ways to
> let
> > > the
> > > >>> people know about it – and now Scarf is there. If your
> organization is
> > > >>> getting a lot of value from a piece of open source software, make
> sure
> > > the
> > > >>> devs know about it.”"
> > > >>>
> > > >>>
> > > >>> [1] https://about.scarf.sh/post/scarf-case-study-apache-superset
> > > >>>
> > > >>> On Sat, 30 Mar 2024 at 14:02, Alexander Shorin <kx...@apache.org>
> > > wrote:
> > > >>>
> > > >>>> Hi Jarek!
> > > >>>>
> > > >>>> I understand the reasons for opt-out from a project view. I just
> > > suddenly
> > > >>>> imagined the situation when an upgrade happens and here comes the
> > > data to
> > > >>>> some third party service - that's a view from a user side of some
> big
> > > >>>> company.
> > > >>>>
> > > >>>> There could be good alternatives to handle this:
> > > >>>> 1. Short opt-in period before opt-out. Test this feature with
> users
> > > who
> > > >>>> trust and if it works great - make it public. I think it's wise to
> > > handle
> > > >>>> edge cases and configure collected data more accurately.
> > > >>>> 2. Explicitly somehow warn about this feature to make this
> feature not
> > > >>>> get
> > > >>>> unnoticed. Just to reduce possible frustration.
> > > >>>>
> > > >>>> Just a personal thoughts for discussion (:
> > > >>>>
> > > >>>> --
> > > >>>> ,,,^..^,,,
> > > >>>>
> > > >>>> On Sat, Mar 30, 2024 at 4:36 PM Jarek Potiuk <ja...@potiuk.com>
> > > wrote:
> > > >>>>
> > > >>>>> Hello everyone,
> > > >>>>>
> > > >>>>> it has to be:
> > > >>>>>
> > > >>>>> 1. Opt-in by default to not trigger security guys about new
> unplanned
> > > >>>>>> activity after regular upgrade.
> > > >>>>>>
> > > >>>>>
> > > >>>>> That's a very good point about security triggering Alexander,
> but I
> > > am
> > > >>>> not
> > > >>>>> so sure it means that we "have to" do opt-in. There are other
> ways of
> > > >>>>> communicating with the "deployment managers" who install and
> upgrade
> > > >>>>> airflow - i.e. release notes. blogs, social media of ours, slack
> > > >>>>> announcements etc. We have plenty of channels we can use to
> > > >>>> communicate the
> > > >>>>> change.
> > > >>>>>
> > > >>>>> I think we have a very good blueprint to follow including at
> least 5
> > > >>>> other
> > > >>>>> ASF projects that also passed the review of the privacy@asf. And
> > > >>>> while I
> > > >>>>> understand (and concur) the urge for opt-in by default coming
> from
> > > >>>> consumer
> > > >>>>> market (where it makes perfect sense) Airflow is not a consumer
> > > >>>>> software and is used in "corporate environment" which has a
> little
> > > >>>>> different expectations and broad assumption that the company can
> make
> > > >>>>> decisions on such telemetry on behalf of the employees using it.
> > > >>>>>
> > > >>>>> We should assume that those who deploy and upgrade Airflow -
> actually
> > > >>>> read
> > > >>>>> and take into account what is written in the release notes -
> > > >>>> especially if
> > > >>>>> they have security guys breathing their necks, similarly as we
> have
> > > to
> > > >>>>> assume they follow CVE announcements about security issues
> fixed. If
> > > we
> > > >>>>> are very straightforward and out-going about the change, inform
> very
> > > >>>>> clearly how to opt-out, I don't see a big problem with opt-out.
> > > >>>>>
> > > >>>>> We should of course check with privacy@a.o (but I'v spend a good
> > > deal
> > > >>>> of
> > > >>>>> time reading the Superset  and other use case and explanation in
> > > >>>> detail to
> > > >>>>> make a better informed decision) - and it looks like they also
> went
> > > >>>> opt-out
> > > >>>>> way and got cleared by privacy@a.o.  And if we cannot reach
> > > >>>> consensus, we
> > > >>>>> should - as usual - make a voting decision on it (because yes,
> it is
> > > an
> > > >>>>> important decision), but - after reading and understanding why
> others
> > > >>>> also
> > > >>>>> did it - for me personally, opt-out is a good path.
> > > >>>>>
> > > >>>>> Also because it will rather increase the amount of data to
> gather,
> > > and
> > > >>>> in
> > > >>>>> our case - counter intuitively - it will be even better for
> privacy
> > > and
> > > >>>>> corporate anonymity, because the more data we get, the more
> difficult
> > > >>>> it
> > > >>>>> will be to get any non-statistical/non-aggregated insight from
> it.
> > > >>>> Imagine
> > > >>>>> if only a few corporate users will enable it consciously - then
> we
> > > >>>> will be
> > > >>>>> able to draw much more conclusions if we find out who they are,
> than
> > > if
> > > >>>>> everyone has it enabled by default.
> > > >>>>>
> > > >>>>> That's my take on it - but again, it's up to us to vote, for me
> > > opt-in
> > > >>>> is
> > > >>>>> not "has to", and I am rather for opt-out.
> > > >>>>>
> > > >>>>> J.
> > > >>>>>
> > > >>>>>> Hi all,
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>> I want to propose gathering telemetry for Airflow
> installations.
> > > >>>> As the
> > > >>>>>>> Airflow community, we have been relying heavily on the yearly
> > > >>>> Airflow
> > > >>>>>>> Survey and anecdotes to answer a few key questions about
> Airflow
> > > >>>> usage.
> > > >>>>>>> Questions like the following:
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>   - Which versions of Airflow are people installing/using now
> > > >>>> (i.e.
> > > >>>>>>>   whether people have primarily made the jump from version X to
> > > >>>>> version
> > > >>>>>> Y)
> > > >>>>>>>   - Which DB is used as the Metadata DB and which version e.g
> Pg
> > > >>>> 14?
> > > >>>>>>>   - What Python version is being used?
> > > >>>>>>>   - Which Executor is being used?
> > > >>>>>>>   - Approximately how many people out there in the world are
> > > >>>>> installing
> > > >>>>>>>   Airflow
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> There is a solution that should help answer these questions:
> Scarf
> > > >>>> [1].
> > > >>>>>> The
> > > >>>>>>> ASF already approves Scarf [2][3] and is already used by other
> ASF
> > > >>>>>>> projects: Superset [4], Dolphin Scheduler [5], Dubbo
> Kubernetes,
> > > >>>>> DevLake,
> > > >>>>>>> Skywalking as it follows GDPR and other regulations.
> > > >>>>>>>
> > > >>>>>>> Similar to Superset, we probably can use it as follows:
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>   1. Install the `scarf js` npm package and bundle it in the
> > > >>>>> Webserver.
> > > >>>>>>>   When the package is downloaded & Airflow webserver is opened,
> > > >>>>> metadata
> > > >>>>>>> is
> > > >>>>>>>   recorded to the Scarf dashboard.
> > > >>>>>>>   2. Utilize the Scarf Gateway [6], which we can use in front
> of
> > > >>>>> docker
> > > >>>>>>>   containers. While it’s possible people go around this
> gateway,
> > > >>>> we
> > > >>>>> can
> > > >>>>>>>   probably configure and encourage most traffic to go through
> > > >>>> these
> > > >>>>>>> gateways.
> > > >>>>>>>
> > > >>>>>>> While Scarf does not store any personally identifying
> information
> > > >>>> from
> > > >>>>>> SDK
> > > >>>>>>> telemetry data, it does send various bits of IP-derived
> > > >>>> information as
> > > >>>>>>> outlined here [7]. This data should be made as transparent as
> > > >>>> possible
> > > >>>>> by
> > > >>>>>>> granting dashboard access to the Airflow PMC and any other
> relevant
> > > >>>>> means
> > > >>>>>>> of sharing/surfacing it that we encounter (Town Hall, Slack,
> > > >>>> Newsletter
> > > >>>>>>> etc).
> > > >>>>>>>
> > > >>>>>>> The following case studies are worth reading:
> > > >>>>>>>
> > > >>>>>>>   1.
> https://about.scarf.sh/post/scarf-case-study-apache-superset
> > > >>>>> (From
> > > >>>>>>>   Maxime)
> > > >>>>>>>   2.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > >
> https://about.scarf.sh/post/haskell-org-bridging-the-gap-between-language-innovation-and-community-understanding
> > > >>>>>>>
> > > >>>>>>> Similar to them, this could help in various ways that come with
> > > >>>> using
> > > >>>>>> data
> > > >>>>>>> for decision-making. With clear guidelines on "how to opt-out"
> > > >>>>>> [8][9][10] &
> > > >>>>>>> "what data is being collected" on the Airflow website, this
> can be
> > > >>>>>>> beneficial to the entire community as we would be making more
> > > >>>> informed
> > > >>>>>>> decisions.
> > > >>>>>>>
> > > >>>>>>> Regards,
> > > >>>>>>> Kaxil
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> [1] https://about.scarf.sh/
> > > >>>>>>> [2]
> https://privacy.apache.org/policies/privacy-policy-public.html
> > > >>>>>>> [3] https://privacy.apache.org/faq/committers.html
> > > >>>>>>> [4] https://github.com/apache/superset/issues/25639
> > > >>>>>>> [5]
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > >
> https://github.com/search?q=repo%3Aapache%2Fdolphinscheduler%20scarf.sh&type=code
> > > >>>>>>> [6] https://about.scarf.sh/scarf-gateway
> > > >>>>>>> [7] https://about.scarf.sh/privacy-policy
> > > >>>>>>> [8]
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > >
> https://superset.apache.org/docs/frequently-asked-questions/#does-superset-collect-any-telemetry-data
> > > >>>>>>> [9]
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > >
> https://superset.apache.org/docs/installation/installing-superset-using-docker-compose
> > > >>>>>>> [10]
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > >
> https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@airflow.apache.org
> > > For additional commands, e-mail: dev-help@airflow.apache.org
> > >
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@airflow.apache.org
> For additional commands, e-mail: dev-help@airflow.apache.org
>
>

Re: [DISCUSS] Proposal for adding Telemetry via Scarf

Posted by Ankit Chaurasia <su...@gmail.com>.
Big +1 for Scarf.

Transparency is key, so it's important to be super clear about opting
out and what's tracked to avoid spooking anyone about IP stuff.

Regards
Ankit Chaurasia




On Mon, Apr 1, 2024 at 10:18 AM Amogh Desai <am...@gmail.com> wrote:
>
> +1 looks like a good tool which could be super helpful.
>
> * We should have some transparency into the data that is collected or sent
> * We should have an option to optionally opt-out
>
> Thanks & Regards,
> Amogh Desai
>
>
> On Sun, Mar 31, 2024 at 7:53 AM Wei Lee <we...@gmail.com> wrote:
>
> > +1 to this. It would be really useful. As long as we can opt out, I think
> > we’re good.
> >
> > Best,
> > Wei
> >
> > > On Mar 31, 2024, at 12:47 AM, Kaxil Naik <ka...@gmail.com> wrote:
> > >
> > > Grammar Correction:
> > >
> > > We should assume that those who deploy and upgrade Airflow - actually
> > read
> > >> and take into account what is written in the release notes - especially
> > if
> > >> they have security guys breathing their necks, similarly as we have to
> > >> assume they follow CVE announcements about security issues fixed. If we
> > >> are very straightforward and out-going about the change, inform very
> > >> clearly how to opt-out, I don't see a big problem with opt-out.
> > >
> > >
> > > I couldn't agree more; even though we shouldn't collect any data that
> > > hamper security (and we should aim to do the same), most security
> > concerned
> > > folks don't just upgrade, and we can rely on them regarding release notes
> > > or announcements and we can make it very clear in our announcements too;
> > > and in our installation guides.
> > >
> > > On Sat, 30 Mar 2024 at 16:47, Kaxil Naik <ka...@gmail.com> wrote:
> > >
> > >> Grammar crrection:
> > >>
> > >>
> > >> On Sat, 30 Mar 2024 at 16:43, Kaxil Naik <ka...@gmail.com> wrote:
> > >>
> > >>> Have this at the end of the email too: but if folks don't read until
> > the
> > >>> end and quoting Maxime from the use-case blog[1]:
> > >>>
> > >>> "I think people often ask ‘how do I contribute to open source?’, ‘I've
> > >>> got to get into the code’, or ‘ I’ve got to be an engineer.’ Actually,
> > the
> > >>> very simplest thing that you can do is just say, ‘my organization gets
> > real
> > >>> value from this piece of software.’ There are a bunch of ways to let
> > the
> > >>> people know about it – and now Scarf is there. If your organization is
> > >>> getting a lot of value from a piece of open source software, make sure
> > the
> > >>> devs know about it."
> > >>>
> > >>> What kind of edge cases are you thinking about? I don't think it makes
> > >>> sense to have "opt-in" at all. As the goal is to collect data for most
> > >>> Airflow installations except for those that don't want to give data,
> > then
> > >>> "opt-out" is the only way to maximize it. As long as we don't collect
> > any
> > >>> PII data, this is in-compliance as well.
> > >>>
> > >>> Imagine someone learning Airflow, if they have to opt-in via a config,
> > >>> they wouldn't even know or care about it, hence us losing most of the
> > data.
> > >>> I understand why some orgs & individuals may want to opt-out.
> > >>>
> > >>> Scarf Provides tracking pixels (essentially an HTML image tag) that you
> > >>> can place in your website or product to track visitors to that URL. If
> > >>> there were any concerns about Privacy, ASF wouldn't have approved it
> > at all.
> > >>>
> > >>> A few key details to note about the pixel:
> > >>>
> > >>>
> > >>>   - No PII is tracked… Scarf does not capture/retain IP information…
> > >>>   this information is discarded by the platform upon
> > processing/aggregating
> > >>>   - Scarf pixels respect the Do Not Track (DNT) settings of browsers -
> > >>>   these users will not be tracked whatsoever.
> > >>>
> > >>>
> > >>> All the ASF projects I had listed (whether they use Scarf gateway or
> > >>> Scarf pixel in product) are using opt-out.
> > >>>
> > >>> 1. Short opt-in period before opt-out. Test this feature with users who
> > >>>> trust and if it works great - make it public. I think it's wise to
> > handle
> > >>>> edge cases and configure collected data more accurately.
> > >>>
> > >>>
> > >>>
> > >>> It would be a pixel in the webserver, should affect nothing at all even
> > >>> in an air-gapped environment.
> > >>>
> > >>>> 2. It should not affect anything if access to the internet is
> > restricted
> > >>>> which is default for many companies.
> > >>>
> > >>>
> > >>>
> > >>> 100% agreed on the below:
> > >>>
> > >>>> I think we have a very good blueprint to follow including at least 5
> > >>>> other
> > >>>> ASF projects that also passed the review of the privacy@asf. And
> > while I
> > >>>> understand (and concur) the urge for opt-in by default coming from
> > >>>> consumer
> > >>>> market (where it makes perfect sense) Airflow is not a consumer
> > >>>> software and is used in "corporate environment" which has a little
> > >>>> different expectations and broad assumption that the company can make
> > >>>> decisions on such telemetry on behalf of the employees using it.
> > >>>
> > >>>
> > >>> Couldn't agree more; even though there shouldn't we collect hamper
> > >>> security (and we should aim to do the same), most security concerned
> > folks
> > >>> don't just
> > >>> upgrade, and we can rely on them regarding release notes or
> > announcements
> > >>> and we can make it very clear in our announcements too; and in our
> > >>> installation guides.
> > >>>
> > >>> We should assume that those who deploy and upgrade Airflow - actually
> > read
> > >>>> and take into account what is written in the release notes -
> > especially
> > >>>> if
> > >>>> they have security guys breathing their necks, similarly as we have to
> > >>>> assume they follow CVE announcements about security issues fixed. If
> > we
> > >>>> are very straightforward and out-going about the change, inform very
> > >>>> clearly how to opt-out, I don't see a big problem with opt-out.
> > >>>
> > >>>
> > >>>
> > >>> To be clear, the collection of data, or at least the data we should
> > >>> gather here should help all the consumers without violating anything
> > >>> regulations. I will quote Maxime's quote in the use-case doc [1]
> > >>>
> > >>> "*Another Form of Contributing*
> > >>> “I think people often ask ‘how do I contribute to open source?’, ‘I've
> > >>> got to get into the code’, or ‘ I’ve got to be an engineer.’ Actually,
> > the
> > >>> very simplest thing that you can do is just say, ‘my organization gets
> > real
> > >>> value from this piece of software.’ There are a bunch of ways to let
> > the
> > >>> people know about it – and now Scarf is there. If your organization is
> > >>> getting a lot of value from a piece of open source software, make sure
> > the
> > >>> devs know about it.”"
> > >>>
> > >>>
> > >>> [1] https://about.scarf.sh/post/scarf-case-study-apache-superset
> > >>>
> > >>> On Sat, 30 Mar 2024 at 14:02, Alexander Shorin <kx...@apache.org>
> > wrote:
> > >>>
> > >>>> Hi Jarek!
> > >>>>
> > >>>> I understand the reasons for opt-out from a project view. I just
> > suddenly
> > >>>> imagined the situation when an upgrade happens and here comes the
> > data to
> > >>>> some third party service - that's a view from a user side of some big
> > >>>> company.
> > >>>>
> > >>>> There could be good alternatives to handle this:
> > >>>> 1. Short opt-in period before opt-out. Test this feature with users
> > who
> > >>>> trust and if it works great - make it public. I think it's wise to
> > handle
> > >>>> edge cases and configure collected data more accurately.
> > >>>> 2. Explicitly somehow warn about this feature to make this feature not
> > >>>> get
> > >>>> unnoticed. Just to reduce possible frustration.
> > >>>>
> > >>>> Just a personal thoughts for discussion (:
> > >>>>
> > >>>> --
> > >>>> ,,,^..^,,,
> > >>>>
> > >>>> On Sat, Mar 30, 2024 at 4:36 PM Jarek Potiuk <ja...@potiuk.com>
> > wrote:
> > >>>>
> > >>>>> Hello everyone,
> > >>>>>
> > >>>>> it has to be:
> > >>>>>
> > >>>>> 1. Opt-in by default to not trigger security guys about new unplanned
> > >>>>>> activity after regular upgrade.
> > >>>>>>
> > >>>>>
> > >>>>> That's a very good point about security triggering Alexander, but I
> > am
> > >>>> not
> > >>>>> so sure it means that we "have to" do opt-in. There are other ways of
> > >>>>> communicating with the "deployment managers" who install and upgrade
> > >>>>> airflow - i.e. release notes. blogs, social media of ours, slack
> > >>>>> announcements etc. We have plenty of channels we can use to
> > >>>> communicate the
> > >>>>> change.
> > >>>>>
> > >>>>> I think we have a very good blueprint to follow including at least 5
> > >>>> other
> > >>>>> ASF projects that also passed the review of the privacy@asf. And
> > >>>> while I
> > >>>>> understand (and concur) the urge for opt-in by default coming from
> > >>>> consumer
> > >>>>> market (where it makes perfect sense) Airflow is not a consumer
> > >>>>> software and is used in "corporate environment" which has a little
> > >>>>> different expectations and broad assumption that the company can make
> > >>>>> decisions on such telemetry on behalf of the employees using it.
> > >>>>>
> > >>>>> We should assume that those who deploy and upgrade Airflow - actually
> > >>>> read
> > >>>>> and take into account what is written in the release notes -
> > >>>> especially if
> > >>>>> they have security guys breathing their necks, similarly as we have
> > to
> > >>>>> assume they follow CVE announcements about security issues fixed. If
> > we
> > >>>>> are very straightforward and out-going about the change, inform very
> > >>>>> clearly how to opt-out, I don't see a big problem with opt-out.
> > >>>>>
> > >>>>> We should of course check with privacy@a.o (but I'v spend a good
> > deal
> > >>>> of
> > >>>>> time reading the Superset  and other use case and explanation in
> > >>>> detail to
> > >>>>> make a better informed decision) - and it looks like they also went
> > >>>> opt-out
> > >>>>> way and got cleared by privacy@a.o.  And if we cannot reach
> > >>>> consensus, we
> > >>>>> should - as usual - make a voting decision on it (because yes, it is
> > an
> > >>>>> important decision), but - after reading and understanding why others
> > >>>> also
> > >>>>> did it - for me personally, opt-out is a good path.
> > >>>>>
> > >>>>> Also because it will rather increase the amount of data to gather,
> > and
> > >>>> in
> > >>>>> our case - counter intuitively - it will be even better for privacy
> > and
> > >>>>> corporate anonymity, because the more data we get, the more difficult
> > >>>> it
> > >>>>> will be to get any non-statistical/non-aggregated insight from it.
> > >>>> Imagine
> > >>>>> if only a few corporate users will enable it consciously - then we
> > >>>> will be
> > >>>>> able to draw much more conclusions if we find out who they are, than
> > if
> > >>>>> everyone has it enabled by default.
> > >>>>>
> > >>>>> That's my take on it - but again, it's up to us to vote, for me
> > opt-in
> > >>>> is
> > >>>>> not "has to", and I am rather for opt-out.
> > >>>>>
> > >>>>> J.
> > >>>>>
> > >>>>>> Hi all,
> > >>>>>>
> > >>>>>>
> > >>>>>>> I want to propose gathering telemetry for Airflow installations.
> > >>>> As the
> > >>>>>>> Airflow community, we have been relying heavily on the yearly
> > >>>> Airflow
> > >>>>>>> Survey and anecdotes to answer a few key questions about Airflow
> > >>>> usage.
> > >>>>>>> Questions like the following:
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>   - Which versions of Airflow are people installing/using now
> > >>>> (i.e.
> > >>>>>>>   whether people have primarily made the jump from version X to
> > >>>>> version
> > >>>>>> Y)
> > >>>>>>>   - Which DB is used as the Metadata DB and which version e.g Pg
> > >>>> 14?
> > >>>>>>>   - What Python version is being used?
> > >>>>>>>   - Which Executor is being used?
> > >>>>>>>   - Approximately how many people out there in the world are
> > >>>>> installing
> > >>>>>>>   Airflow
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> There is a solution that should help answer these questions: Scarf
> > >>>> [1].
> > >>>>>> The
> > >>>>>>> ASF already approves Scarf [2][3] and is already used by other ASF
> > >>>>>>> projects: Superset [4], Dolphin Scheduler [5], Dubbo Kubernetes,
> > >>>>> DevLake,
> > >>>>>>> Skywalking as it follows GDPR and other regulations.
> > >>>>>>>
> > >>>>>>> Similar to Superset, we probably can use it as follows:
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>   1. Install the `scarf js` npm package and bundle it in the
> > >>>>> Webserver.
> > >>>>>>>   When the package is downloaded & Airflow webserver is opened,
> > >>>>> metadata
> > >>>>>>> is
> > >>>>>>>   recorded to the Scarf dashboard.
> > >>>>>>>   2. Utilize the Scarf Gateway [6], which we can use in front of
> > >>>>> docker
> > >>>>>>>   containers. While it’s possible people go around this gateway,
> > >>>> we
> > >>>>> can
> > >>>>>>>   probably configure and encourage most traffic to go through
> > >>>> these
> > >>>>>>> gateways.
> > >>>>>>>
> > >>>>>>> While Scarf does not store any personally identifying information
> > >>>> from
> > >>>>>> SDK
> > >>>>>>> telemetry data, it does send various bits of IP-derived
> > >>>> information as
> > >>>>>>> outlined here [7]. This data should be made as transparent as
> > >>>> possible
> > >>>>> by
> > >>>>>>> granting dashboard access to the Airflow PMC and any other relevant
> > >>>>> means
> > >>>>>>> of sharing/surfacing it that we encounter (Town Hall, Slack,
> > >>>> Newsletter
> > >>>>>>> etc).
> > >>>>>>>
> > >>>>>>> The following case studies are worth reading:
> > >>>>>>>
> > >>>>>>>   1. https://about.scarf.sh/post/scarf-case-study-apache-superset
> > >>>>> (From
> > >>>>>>>   Maxime)
> > >>>>>>>   2.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > https://about.scarf.sh/post/haskell-org-bridging-the-gap-between-language-innovation-and-community-understanding
> > >>>>>>>
> > >>>>>>> Similar to them, this could help in various ways that come with
> > >>>> using
> > >>>>>> data
> > >>>>>>> for decision-making. With clear guidelines on "how to opt-out"
> > >>>>>> [8][9][10] &
> > >>>>>>> "what data is being collected" on the Airflow website, this can be
> > >>>>>>> beneficial to the entire community as we would be making more
> > >>>> informed
> > >>>>>>> decisions.
> > >>>>>>>
> > >>>>>>> Regards,
> > >>>>>>> Kaxil
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> [1] https://about.scarf.sh/
> > >>>>>>> [2] https://privacy.apache.org/policies/privacy-policy-public.html
> > >>>>>>> [3] https://privacy.apache.org/faq/committers.html
> > >>>>>>> [4] https://github.com/apache/superset/issues/25639
> > >>>>>>> [5]
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > https://github.com/search?q=repo%3Aapache%2Fdolphinscheduler%20scarf.sh&type=code
> > >>>>>>> [6] https://about.scarf.sh/scarf-gateway
> > >>>>>>> [7] https://about.scarf.sh/privacy-policy
> > >>>>>>> [8]
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > https://superset.apache.org/docs/frequently-asked-questions/#does-superset-collect-any-telemetry-data
> > >>>>>>> [9]
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > https://superset.apache.org/docs/installation/installing-superset-using-docker-compose
> > >>>>>>> [10]
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@airflow.apache.org
> > For additional commands, e-mail: dev-help@airflow.apache.org
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@airflow.apache.org
For additional commands, e-mail: dev-help@airflow.apache.org