You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@openoffice.apache.org by drew <dr...@baseanswers.com> on 2011/08/12 17:30:00 UTC

[WWW] Web analytics

Hi,

Well, I thought I'd start this list - unless folks think it is too soon
of course.

Currently the OO.o web sites, all of them, utilize a third party
analytics firm (not Google). This is done by requiring that each site
inject a small bit of JS in the page footer. (this was another
requirement for the forums as an example and is used also at the
extension/template repository)

Going forward does the PPMC desire to use a third party analytic firm -
i.e. Google.

or

Do we prefer to do it all in-house. (just guessing that their may be
some knowledge on this subject in the overall Apache project :)

Thanks.

//drew


Re: [WWW] Web analytics

Posted by Andrea Pescetti <pe...@openoffice.org>.
drew wrote:
> Currently the OO.o web sites, all of them, utilize a third party
> analytics firm (not Google).

Actually, as I wrote a few weeks ago, http://www.openoffice.org is using 
Google Analytics; I just rechecked and I can confirm that 
http://www.openoffice.org loads elements from
http://www.google-analytics.com

> Going forward does the PPMC desire to use a third party analytic firm -
> i.e. Google.

Even though it is not ubiquitous, it's already there. Questions 
contained in your e-mail are of course still valid; I was citing this 
because it might be interesting to have access to the Google Analytics 
reports for the http://www.openoffice.org site.

Regards,
   Andrea.

Re: [WWW] Web analytics

Posted by Eike Rathke <oo...@erack.de>.
Hi Marcus,

On Friday, 2011-08-12 17:49:59 +0200, Marcus (OOo) wrote:

> >Going forward does the PPMC desire to use a third party analytic firm -
> >i.e. Google.
> >
> >or
> >
> >Do we prefer to do it all in-house. (just guessing that their may be
> >some knowledge on this subject in the overall Apache project :)
> 
> In-house needs a bit more knowledge and willingness to keep the
> maintain stuff up. Maybe a bit too much overload because only a few
> websites and numbers are really important to track. So, using like
> Google Analytics is maybe already sufficient.

There's Piwik, http://piwik.org/

  Eike

-- 
 PGP/OpenPGP/GnuPG encrypted mail preferred in all private communication.
 Key ID: 0x293C05FD - 997A 4C60 CE41 0149 0DB3  9E96 2F1A D073 293C 05FD

Re: [WWW] Web analytics

Posted by "Marcus (OOo)" <ma...@wtnet.de>.
Am 08/12/2011 05:30 PM, schrieb drew:
> Hi,
>
> Well, I thought I'd start this list - unless folks think it is too soon
> of course.
>
> Currently the OO.o web sites, all of them, utilize a third party
> analytics firm (not Google). This is done by requiring that each site
> inject a small bit of JS in the page footer. (this was another
> requirement for the forums as an example and is used also at the
> extension/template repository)
>
> Going forward does the PPMC desire to use a third party analytic firm -
> i.e. Google.
>
> or
>
> Do we prefer to do it all in-house. (just guessing that their may be
> some knowledge on this subject in the overall Apache project :)
>
> Thanks.
>
> //drew

In-house needs a bit more knowledge and willingness to keep the maintain 
stuff up. Maybe a bit too much overload because only a few websites and 
numbers are really important to track. So, using like Google Analytics 
is maybe already sufficient.

However, if someone has the knowledge and is willing to build a solution 
just speak up. ;-)

Marcus

Re: [WWW] Web analytics

Posted by Rob Weir <ap...@robweir.com>.
On Sun, Aug 14, 2011 at 11:35 AM, Kay Schenk <ka...@gmail.com> wrote:
>
>
> On 08/13/2011 07:23 PM, Rob Weir wrote:
>>
>> On Sat, Aug 13, 2011 at 11:28 AM, Kay Schenk<ka...@gmail.com>  wrote:
>>>
>>>
>>> On 08/13/2011 05:55 AM, Rob Weir wrote:
>>> <--snip-->
>>>
>>>>>
>>>>
>>>> My understanding is that there were two issues raised by regulators:
>>>>
>>>> 1) Google stores IP addresses of visitors.  It does not make the IP
>>>> addresses available to users of Google Analytics, but stores it
>>>> themselves.  This has been interpreted by one regulator as violating a
>>>> ban on storing personally identifying information beyond the duration
>>>> of a session.  The interpretation is that an IP address is personally
>>>> identifying information.
>>>>
>>>> The odd thing here is that it appears to be ignoring the state of the
>>>> art, which is that other information, excluding IP address, is
>>>> actually more accurate in tracking users, e.g., "fingerprinting" them
>>>> via their browser settings, fonts, etc.  See:
>>>> https://panopticlick.eff.org/  In other words, it is the correlation
>>>> of basic common facts that makes the user identifiable.  It doesn't
>>>> require a single unique piece of data.
>>>>
>>>> 2) Google has an opt-out browser plugin, but it is not available for
>>>> Opera or Safari.
>>>>
>>>>>> Storing the data ourselves is a double-edged sword.  If we store it,
>>>>>> then we are responsible for any problems with that data.
>>>>>
>>>>> Yes. And configuring Piwik the way described there it does not store
>>>>> personally identifiable data.
>>>>>
>>>>
>>>> If we think Piwik addresses the IP address and the opt-out issues,
>>>> then that sounds like a good solution.  If we think Piwik is well
>>>> maintained, etc. I have no objections to Piwik.
>>>>
>>> <--snip-->
>>>
>>> OK, a couple of short comments on this -- esp Google analytics.
>>>
>>> G. analytics requires code inserted into pages you want to track. Not a
>>> biggie since we have templates, but...if the analytics server is down
>>> (rarely but it DOES happen), this prevents page loading. Analytics is
>>> great
>>> but really maybe overkill for just simplistic info like browser
>>> identification. I have no knowledge of Piwik.
>>>
>>
>> That was first generation.  Google Analytics now has an asynchronous
>> option, which allows the page to render while the tracking code does
>> its stuff in the background.  No idea if Piwik allows that as well.
>
> oops! OK -- my bad. Haven't kept up with this in a while.
>
> Still I can't help but think that Analytics, with its individual
> registration (i.e. by a designated individual) might be more of an
> administrative headache than we really need for simple tracking stats.
> It's a great service but would it serve our administrative setup needs?
>

Yes, but it is the same problem we'll run into with any other online
service we want to use for the project, where the service has a single
login.  So if we want an AOOo Twitter account, or Facebook fan page,
or whatever, then we'll need some way to manage that registration and
access to that account.

So we need to figure that out eventually.

>>
>>> and 2) I'm surprised Apache doesn't have some internal log analysis
>>> program
>>> --like Awstats -- installed for the whole domain. It's really quite
>>> simple
>>> to deal with but, yes, does require some caretaking.
>>>
>>>
>>>
>>> --
>>> ------------------------------------------------------------------------
>>> MzK
>>>
>>> "Those who love deeply never grow old;
>>>  they may die of old age, but they die young."
>>>                        -- Sir Arthur Pinero
>>>
>
> --
> ------------------------------------------------------------------------
> MzK
>
> "Those who love deeply never grow old;
>  they may die of old age, but they die young."
>                        -- Sir Arthur Pinero
>

Re: [WWW] Web analytics

Posted by Kay Schenk <ka...@gmail.com>.

On 08/13/2011 07:23 PM, Rob Weir wrote:
> On Sat, Aug 13, 2011 at 11:28 AM, Kay Schenk<ka...@gmail.com>  wrote:
>>
>>
>> On 08/13/2011 05:55 AM, Rob Weir wrote:
>> <--snip-->
>>
>>>>
>>>
>>> My understanding is that there were two issues raised by regulators:
>>>
>>> 1) Google stores IP addresses of visitors.  It does not make the IP
>>> addresses available to users of Google Analytics, but stores it
>>> themselves.  This has been interpreted by one regulator as violating a
>>> ban on storing personally identifying information beyond the duration
>>> of a session.  The interpretation is that an IP address is personally
>>> identifying information.
>>>
>>> The odd thing here is that it appears to be ignoring the state of the
>>> art, which is that other information, excluding IP address, is
>>> actually more accurate in tracking users, e.g., "fingerprinting" them
>>> via their browser settings, fonts, etc.  See:
>>> https://panopticlick.eff.org/  In other words, it is the correlation
>>> of basic common facts that makes the user identifiable.  It doesn't
>>> require a single unique piece of data.
>>>
>>> 2) Google has an opt-out browser plugin, but it is not available for
>>> Opera or Safari.
>>>
>>>>> Storing the data ourselves is a double-edged sword.  If we store it,
>>>>> then we are responsible for any problems with that data.
>>>>
>>>> Yes. And configuring Piwik the way described there it does not store
>>>> personally identifiable data.
>>>>
>>>
>>> If we think Piwik addresses the IP address and the opt-out issues,
>>> then that sounds like a good solution.  If we think Piwik is well
>>> maintained, etc. I have no objections to Piwik.
>>>
>> <--snip-->
>>
>> OK, a couple of short comments on this -- esp Google analytics.
>>
>> G. analytics requires code inserted into pages you want to track. Not a
>> biggie since we have templates, but...if the analytics server is down
>> (rarely but it DOES happen), this prevents page loading. Analytics is great
>> but really maybe overkill for just simplistic info like browser
>> identification. I have no knowledge of Piwik.
>>
>
> That was first generation.  Google Analytics now has an asynchronous
> option, which allows the page to render while the tracking code does
> its stuff in the background.  No idea if Piwik allows that as well.

oops! OK -- my bad. Haven't kept up with this in a while.

Still I can't help but think that Analytics, with its individual 
registration (i.e. by a designated individual) might be more of an 
administrative headache than we really need for simple tracking stats.
It's a great service but would it serve our administrative setup needs?

>
>> and 2) I'm surprised Apache doesn't have some internal log analysis program
>> --like Awstats -- installed for the whole domain. It's really quite simple
>> to deal with but, yes, does require some caretaking.
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> MzK
>>
>> "Those who love deeply never grow old;
>>   they may die of old age, but they die young."
>>                         -- Sir Arthur Pinero
>>

-- 
------------------------------------------------------------------------
MzK

"Those who love deeply never grow old;
  they may die of old age, but they die young."
                         -- Sir Arthur Pinero

Re: [WWW] Web analytics

Posted by Rob Weir <ap...@robweir.com>.
On Sat, Aug 13, 2011 at 11:28 AM, Kay Schenk <ka...@gmail.com> wrote:
>
>
> On 08/13/2011 05:55 AM, Rob Weir wrote:
> <--snip-->
>
>>>
>>
>> My understanding is that there were two issues raised by regulators:
>>
>> 1) Google stores IP addresses of visitors.  It does not make the IP
>> addresses available to users of Google Analytics, but stores it
>> themselves.  This has been interpreted by one regulator as violating a
>> ban on storing personally identifying information beyond the duration
>> of a session.  The interpretation is that an IP address is personally
>> identifying information.
>>
>> The odd thing here is that it appears to be ignoring the state of the
>> art, which is that other information, excluding IP address, is
>> actually more accurate in tracking users, e.g., "fingerprinting" them
>> via their browser settings, fonts, etc.  See:
>> https://panopticlick.eff.org/  In other words, it is the correlation
>> of basic common facts that makes the user identifiable.  It doesn't
>> require a single unique piece of data.
>>
>> 2) Google has an opt-out browser plugin, but it is not available for
>> Opera or Safari.
>>
>>>> Storing the data ourselves is a double-edged sword.  If we store it,
>>>> then we are responsible for any problems with that data.
>>>
>>> Yes. And configuring Piwik the way described there it does not store
>>> personally identifiable data.
>>>
>>
>> If we think Piwik addresses the IP address and the opt-out issues,
>> then that sounds like a good solution.  If we think Piwik is well
>> maintained, etc. I have no objections to Piwik.
>>
> <--snip-->
>
> OK, a couple of short comments on this -- esp Google analytics.
>
> G. analytics requires code inserted into pages you want to track. Not a
> biggie since we have templates, but...if the analytics server is down
> (rarely but it DOES happen), this prevents page loading. Analytics is great
> but really maybe overkill for just simplistic info like browser
> identification. I have no knowledge of Piwik.
>

That was first generation.  Google Analytics now has an asynchronous
option, which allows the page to render while the tracking code does
its stuff in the background.  No idea if Piwik allows that as well.

> and 2) I'm surprised Apache doesn't have some internal log analysis program
> --like Awstats -- installed for the whole domain. It's really quite simple
> to deal with but, yes, does require some caretaking.
>
>
>
> --
> ------------------------------------------------------------------------
> MzK
>
> "Those who love deeply never grow old;
>  they may die of old age, but they die young."
>                        -- Sir Arthur Pinero
>

Re: [WWW] Web analytics

Posted by Kay Schenk <ka...@gmail.com>.

On 08/13/2011 05:55 AM, Rob Weir wrote:
<--snip-->

>>
>
> My understanding is that there were two issues raised by regulators:
>
> 1) Google stores IP addresses of visitors.  It does not make the IP
> addresses available to users of Google Analytics, but stores it
> themselves.  This has been interpreted by one regulator as violating a
> ban on storing personally identifying information beyond the duration
> of a session.  The interpretation is that an IP address is personally
> identifying information.
>
> The odd thing here is that it appears to be ignoring the state of the
> art, which is that other information, excluding IP address, is
> actually more accurate in tracking users, e.g., "fingerprinting" them
> via their browser settings, fonts, etc.  See:
> https://panopticlick.eff.org/  In other words, it is the correlation
> of basic common facts that makes the user identifiable.  It doesn't
> require a single unique piece of data.
>
> 2) Google has an opt-out browser plugin, but it is not available for
> Opera or Safari.
>
>>> Storing the data ourselves is a double-edged sword.  If we store it,
>>> then we are responsible for any problems with that data.
>>
>> Yes. And configuring Piwik the way described there it does not store
>> personally identifiable data.
>>
>
> If we think Piwik addresses the IP address and the opt-out issues,
> then that sounds like a good solution.  If we think Piwik is well
> maintained, etc. I have no objections to Piwik.
>
<--snip-->

OK, a couple of short comments on this -- esp Google analytics.

G. analytics requires code inserted into pages you want to track. Not a 
biggie since we have templates, but...if the analytics server is down 
(rarely but it DOES happen), this prevents page loading. Analytics is 
great but really maybe overkill for just simplistic info like browser 
identification. I have no knowledge of Piwik.

and 2) I'm surprised Apache doesn't have some internal log analysis 
program --like Awstats -- installed for the whole domain. It's really 
quite simple to deal with but, yes, does require some caretaking.



-- 
------------------------------------------------------------------------
MzK

"Those who love deeply never grow old;
  they may die of old age, but they die young."
                         -- Sir Arthur Pinero

Re: [WWW] Web analytics

Posted by Rob Weir <ap...@robweir.com>.
On Sat, Aug 13, 2011 at 6:21 AM, Eike Rathke <oo...@erack.de> wrote:
> Hi Rob,
>
> On Friday, 2011-08-12 16:42:20 -0400, Rob Weir wrote:
>
>> > The big difference is that with Piwik the data collected stays inhouse
>> > at Apache, whereas with Google it goes to Google that does whatever you
>> > don't know. This again implies that at Apache measures must be taken to
>> > protect the privacy of collected data. The German "Landeszentrum für
>> > Datenschutz Schleswig-Holstein" (center of data protection) has a few
>> > documents about tracking [1], unfortunately only in German, why Google
>> > Analytics doesn't comply with the German data protection law [2] and how
>> > Piwik can be configured to be used in compliance with the law [3].
>> >
>>
>> Does this law matter if the servers are hosted in the US, not in
>> Germany?  (I'm assuming that the Apache servers are in the US).
>
> No, but given that German data protection law is probably one of the
> more strict, setting up an environment that fulfills those requirements
> seemst to be a good approach to me.
>

My understanding is that there were two issues raised by regulators:

1) Google stores IP addresses of visitors.  It does not make the IP
addresses available to users of Google Analytics, but stores it
themselves.  This has been interpreted by one regulator as violating a
ban on storing personally identifying information beyond the duration
of a session.  The interpretation is that an IP address is personally
identifying information.

The odd thing here is that it appears to be ignoring the state of the
art, which is that other information, excluding IP address, is
actually more accurate in tracking users, e.g., "fingerprinting" them
via their browser settings, fonts, etc.  See:
https://panopticlick.eff.org/  In other words, it is the correlation
of basic common facts that makes the user identifiable.  It doesn't
require a single unique piece of data.

2) Google has an opt-out browser plugin, but it is not available for
Opera or Safari.

>> Storing the data ourselves is a double-edged sword.  If we store it,
>> then we are responsible for any problems with that data.
>
> Yes. And configuring Piwik the way described there it does not store
> personally identifiable data.
>

If we think Piwik addresses the IP address and the opt-out issues,
then that sounds like a good solution.  If we think Piwik is well
maintained, etc. I have no objections to Piwik.

>> Google states what they can do with the data, but it is rather broad,
>> as you know.
>
> Yes, but we shouldn't discuss that here. All I said was there is an
> alternative that doesn't store personally identifiable data and also
> doesn't give it away to someone else to process.
>
>  Eike
>
> --
>  PGP/OpenPGP/GnuPG encrypted mail preferred in all private communication.
>  Key ID: 0x293C05FD - 997A 4C60 CE41 0149 0DB3  9E96 2F1A D073 293C 05FD
>

Re: [WWW] Web analytics

Posted by Eike Rathke <oo...@erack.de>.
Hi Rob,

On Friday, 2011-08-12 16:42:20 -0400, Rob Weir wrote:

> > The big difference is that with Piwik the data collected stays inhouse
> > at Apache, whereas with Google it goes to Google that does whatever you
> > don't know. This again implies that at Apache measures must be taken to
> > protect the privacy of collected data. The German "Landeszentrum für
> > Datenschutz Schleswig-Holstein" (center of data protection) has a few
> > documents about tracking [1], unfortunately only in German, why Google
> > Analytics doesn't comply with the German data protection law [2] and how
> > Piwik can be configured to be used in compliance with the law [3].
> >
> 
> Does this law matter if the servers are hosted in the US, not in
> Germany?  (I'm assuming that the Apache servers are in the US).

No, but given that German data protection law is probably one of the
more strict, setting up an environment that fulfills those requirements
seemst to be a good approach to me.

> Storing the data ourselves is a double-edged sword.  If we store it,
> then we are responsible for any problems with that data.

Yes. And configuring Piwik the way described there it does not store
personally identifiable data.

> Google states what they can do with the data, but it is rather broad,
> as you know.

Yes, but we shouldn't discuss that here. All I said was there is an
alternative that doesn't store personally identifiable data and also
doesn't give it away to someone else to process.

  Eike

-- 
 PGP/OpenPGP/GnuPG encrypted mail preferred in all private communication.
 Key ID: 0x293C05FD - 997A 4C60 CE41 0149 0DB3  9E96 2F1A D073 293C 05FD

RE: [WWW] Web analytics

Posted by Gavin McDonald <ga...@16degrees.com.au>.

> -----Original Message-----
> From: Dave Fisher [mailto:dave2wave@comcast.net]
> Sent: Saturday, 13 August 2011 11:39 AM
> To: ooo-dev@incubator.apache.org
> Subject: Re: [WWW] Web analytics
> 
> 
> On Aug 12, 2011, at 5:52 PM, Gavin McDonald wrote:
> 
> >
> >
> >> -----Original Message-----
> >> From: Rob Weir [mailto:apache@robweir.com]
> >> Sent: Saturday, 13 August 2011 8:31 AM
> >> To: ooo-dev@incubator.apache.org
> >> Subject: Re: [WWW] Web analytics
> >>
> > <snip>
> >>
> >> Any web analytics package is going to track IP address and store a
cookie.
> >> That is how it knows what country you are from and whether you are a
> >> new or a returning user.
> >>
> >
> > These days cookies are not reliable enough for analytics to produce
> accurate data.
> >
> > Lots of users these days have Anti-Virus programs running, Windows has
> > it build in these days. These programs are trained to consider cookies
> > as low risk but at the same time allow the user to delete cookies.
> >
> > Personally, I'll be treated as a new visitor to the site every week as
> > far as analytics is concerned, as that is how often I clean out my
cookies.
> 
> Glad you are following the thread.
> 
> Are the access logs from main apache hosted sites available for the
project to
> analyze?

There is some raw data available at:

http://apache.org/server-status

though probably not useful enough,


> 
> If so, do you know of any analysis tools currently hosted by Apache
> Infrastructure?

The infra team as such has access to logs to analyse for security and for
load reasons
etc; and do not collect data or use tools for the purpose we are talking
about here.

However, luckily for some Vadim has a stats site for apache projects:

http://people.apache.org/~vgritsenko/index.html

which contains some useful data, not quite as in depth as other tools like
awstats
but still some good stuff in there. Unfortunately it doesn't go deeper than
Incubator
level so at this time Incubator projects cannot get individual stats. Some
projects are
also missing so I think some of the site config is manual and needs
updating. (So when
Ooo graduates to tlp we should prod Vadim to get it added)

Other projects, incubator or tlp are currently using other site stats
methods to gather 
their own data, including the use of google analytics.

Gav...

> 
> Regards,
> Dave


Re: [WWW] Web analytics

Posted by Dave Fisher <da...@comcast.net>.
On Aug 12, 2011, at 5:52 PM, Gavin McDonald wrote:

> 
> 
>> -----Original Message-----
>> From: Rob Weir [mailto:apache@robweir.com]
>> Sent: Saturday, 13 August 2011 8:31 AM
>> To: ooo-dev@incubator.apache.org
>> Subject: Re: [WWW] Web analytics
>> 
> <snip>
>> 
>> Any web analytics package is going to track IP address and store a cookie.
>> That is how it knows what country you are from and whether you are a new
>> or a returning user.
>> 
> 
> These days cookies are not reliable enough for analytics to produce accurate data.
> 
> Lots of users these days have Anti-Virus programs running, Windows has it build in
> these days. These programs are trained to consider cookies as low risk but at the
> same time allow the user to delete cookies.
> 
> Personally, I'll be treated as a new visitor to the site every week as far as analytics
> is concerned, as that is how often I clean out my cookies.

Glad you are following the thread.

Are the access logs from main apache hosted sites available for the project to analyze?

If so, do you know of any analysis tools currently hosted by Apache Infrastructure?

Regards,
Dave

RE: [WWW] Web analytics

Posted by Gavin McDonald <ga...@16degrees.com.au>.

> -----Original Message-----
> From: Rob Weir [mailto:apache@robweir.com]
> Sent: Saturday, 13 August 2011 11:42 AM
> To: ooo-dev@incubator.apache.org
> Subject: Re: [WWW] Web analytics
> 
> On Fri, Aug 12, 2011 at 8:52 PM, Gavin McDonald <ga...@16degrees.com.au>
> wrote:
> >
> >
> >> -----Original Message-----
> >> From: Rob Weir [mailto:apache@robweir.com]
> >> Sent: Saturday, 13 August 2011 8:31 AM
> >> To: ooo-dev@incubator.apache.org
> >> Subject: Re: [WWW] Web analytics
> >>
> > <snip>
> >>
> >> Any web analytics package is going to track IP address and store a cookie.
> >> That is how it knows what country you are from and whether you are a
> >> new or a returning user.
> >>
> >
> > These days cookies are not reliable enough for analytics to produce
> accurate data.
> >
> > Lots of users these days have Anti-Virus programs running, Windows has
> > it build in these days. These programs are trained to consider cookies
> > as low risk but at the same time allow the user to delete cookies.
> >
> > Personally, I'll be treated as a new visitor to the site every week as
> > far as analytics is concerned, as that is how often I clean out my cookies.
> >
> 
> And that's fine.  There will always be a level of background noise in the data.
> Other factors include users who share machines, or users that have multiple
> machines.  Because of that we shouldn't put much credence in absolute
> numbers.  The interesting thing is the change in numbers, the variation from
> the baseline.
> 
> For example, imagine we see a sudden spike in new visitors (or what the
> analytic thinks are new users).  When that happens, is certainly possible that
> this was just caused by a large number of repeat visitors at the same time
> suddenly installing anti-virus that cleans out their cookies on a weekly basis.
> That is not impossible.  But the more likely explanation is that we actually did
> have a spike in new visitors.
> 
> From marketing perspective we can use this kind of info to gauge the
> effectiveness of different outreach techniques.
> 

I agree

Gav...

> 
> > Gav...
> >
> >
> >


Re: [WWW] Web analytics

Posted by Rob Weir <ap...@robweir.com>.
On Fri, Aug 12, 2011 at 8:52 PM, Gavin McDonald <ga...@16degrees.com.au> wrote:
>
>
>> -----Original Message-----
>> From: Rob Weir [mailto:apache@robweir.com]
>> Sent: Saturday, 13 August 2011 8:31 AM
>> To: ooo-dev@incubator.apache.org
>> Subject: Re: [WWW] Web analytics
>>
> <snip>
>>
>> Any web analytics package is going to track IP address and store a cookie.
>> That is how it knows what country you are from and whether you are a new
>> or a returning user.
>>
>
> These days cookies are not reliable enough for analytics to produce accurate data.
>
> Lots of users these days have Anti-Virus programs running, Windows has it build in
> these days. These programs are trained to consider cookies as low risk but at the
> same time allow the user to delete cookies.
>
> Personally, I'll be treated as a new visitor to the site every week as far as analytics
> is concerned, as that is how often I clean out my cookies.
>

And that's fine.  There will always be a level of background noise in
the data.  Other factors include users who share machines, or users
that have multiple machines.  Because of that we shouldn't put much
credence in absolute numbers.  The interesting thing is the change in
numbers, the variation from the baseline.

For example, imagine we see a sudden spike in new visitors (or what
the analytic thinks are new users).  When that happens, is certainly
possible that this was just caused by a large number of repeat
visitors at the same time suddenly installing anti-virus that cleans
out their cookies on a weekly basis.  That is not impossible.  But the
more likely explanation is that we actually did have a spike in new
visitors.

>From marketing perspective we can use this kind of info to gauge the
effectiveness of different outreach techniques.


> Gav...
>
>
>

RE: [WWW] Web analytics

Posted by Gavin McDonald <ga...@16degrees.com.au>.

> -----Original Message-----
> From: Rob Weir [mailto:apache@robweir.com]
> Sent: Saturday, 13 August 2011 8:31 AM
> To: ooo-dev@incubator.apache.org
> Subject: Re: [WWW] Web analytics
> 
<snip>
> 
> Any web analytics package is going to track IP address and store a cookie.
> That is how it knows what country you are from and whether you are a new
> or a returning user.
> 

These days cookies are not reliable enough for analytics to produce accurate data.

Lots of users these days have Anti-Virus programs running, Windows has it build in
these days. These programs are trained to consider cookies as low risk but at the
same time allow the user to delete cookies.

Personally, I'll be treated as a new visitor to the site every week as far as analytics
is concerned, as that is how often I clean out my cookies.

Gav...



Re: [WWW] Web analytics

Posted by "Marcus (OOo)" <ma...@wtnet.de>.
Am 08/13/2011 03:49 AM, schrieb Rob Weir:
> On Fri, Aug 12, 2011 at 7:01 PM, Marcus (OOo)<ma...@wtnet.de>  wrote:
>> Am 08/13/2011 12:30 AM, schrieb Rob Weir:
> <snip>
>>> Remember, even if we used Piwik, the data would be in the US.  All
>>> user accounts for Apache, all wiki accounts, all mailing lists
>>> subscription data, etc., is in the US.  We have a jurisdiction.
>>
>> So, it's in our hands to protect them and don't have to trust others
>> (companies).
>>
>
> That's an interesting question.  Trust Google or trust ourselves?
> Where do you keep your retirement savings?  In a bank?  Or under your
> bed?
>
> I think Google has the incentive and the investment to ensure the
> security of the data.  If they screw it up, they lose billions.  If we
> screw it up, we say "whoops".  Which gives the visitors the best
> assurances?

Loosing billions? Hm, when I thought about the last news about data 
theft (like Sony with their PSN problem, Citibank has lost data of their 
customers, Apple has problems with the iPhone that collects a "little 
bit" too much data). So what? These companies are still alive.

> Of course, you could argue that Google is the larger target, and more
> people are trying to hack Google than are trying to hack Apache.  But
> that also means that Google has more resources deployed to harden
> their services against hacking.
>
> There are not perfect choices here.  But I still keep my money in the bank.

Me too. But I don't choose any bank. I look deep in the market and only 
banks with a clean behavior are in my choice to store my millions. I 
really try to think twice.

I don't carry something to someone who is making noise about his 
business. I wouldn't really trust them. You know, don't talk about 
things you are doing, just do it. ;-)

BTW:
I would suggest to end this dicussion here as it goes into a wrong 
direction: too much of politics. ;-)

Marcus

Marcus

Re: [WWW] Web analytics

Posted by Rob Weir <ap...@robweir.com>.
On Fri, Aug 12, 2011 at 7:01 PM, Marcus (OOo) <ma...@wtnet.de> wrote:
> Am 08/13/2011 12:30 AM, schrieb Rob Weir:
<snip>
>> Remember, even if we used Piwik, the data would be in the US.  All
>> user accounts for Apache, all wiki accounts, all mailing lists
>> subscription data, etc., is in the US.  We have a jurisdiction.
>
> So, it's in our hands to protect them and don't have to trust others
> (companies).
>

That's an interesting question.  Trust Google or trust ourselves?
Where do you keep your retirement savings?  In a bank?  Or under your
bed?

I think Google has the incentive and the investment to ensure the
security of the data.  If they screw it up, they lose billions.  If we
screw it up, we say "whoops".  Which gives the visitors the best
assurances?

Of course, you could argue that Google is the larger target, and more
people are trying to hack Google than are trying to hack Apache.  But
that also means that Google has more resources deployed to harden
their services against hacking.

There are not perfect choices here.  But I still keep my money in the bank.

Re: [WWW] Web analytics

Posted by "Marcus (OOo)" <ma...@wtnet.de>.
Am 08/13/2011 12:30 AM, schrieb Rob Weir:
> On Fri, Aug 12, 2011 at 5:48 PM, Marcus (OOo)<ma...@wtnet.de>  wrote:
>> Am 08/12/2011 10:42 PM, schrieb Rob Weir:
>>>
>>> On Fri, Aug 12, 2011 at 4:14 PM, Eike Rathke<oo...@erack.de>    wrote:
>>>>
>>>> Hi Rob,
>>>>
>>>> On Friday, 2011-08-12 13:29:00 -0400, Rob Weir wrote:
>>>>
>>>>>> Before taking that step, it's worth asking if the project actually
>>>>>> has a need for web analytics yet. They were included on OO.o site
>>>>>> mainly because Sun was using the data as part of its business
>>>>>> metrics. It's not obvious that the same need exists in AOOo.
>>>>>
>>>>> I think it is an essential tool to optimizing the web experience for
>>>>> our visitors.  It is part of a feedback loop where we look at the
>>>>> traffic stats, how our website is actually being used, the
>>>>> demographics of the visitors, etc., and then iteratively improve the
>>>>> website to make it more useful.
>>>>
>>>> So first question is: analytics yes or no, which affects also the
>>>> Privacy Policy.
>>>>
>>>>> On the question of Piwik (open source, used, for example by
>>>>> LibreOffice) versus Google Analytics,  I'm very familiar with Google,
>>>>> so I could help more there.  But I don't have an informed opinion on
>>>>> the virtues of each.  I've never heard of Piwik until today.
>>>>
>>>> The big difference is that with Piwik the data collected stays inhouse
>>>> at Apache, whereas with Google it goes to Google that does whatever you
>>>> don't know. This again implies that at Apache measures must be taken to
>>>> protect the privacy of collected data. The German "Landeszentrum für
>>>> Datenschutz Schleswig-Holstein" (center of data protection) has a few
>>>> documents about tracking [1], unfortunately only in German, why Google
>>>> Analytics doesn't comply with the German data protection law [2] and how
>>>> Piwik can be configured to be used in compliance with the law [3].
>>>>
>>>
>>> Does this law matter if the servers are hosted in the US, not in
>>> Germany?  (I'm assuming that the Apache servers are in the US).
>>
>> No, but it not a secret that the protection of private data is, hm, not the
>> best in the US compared with other. So, why stick with this?
>>
>
> Remember, even if we used Piwik, the data would be in the US.  All
> user accounts for Apache, all wiki accounts, all mailing lists
> subscription data, etc., is in the US.  We have a jurisdiction.

So, it's in our hands to protect them and don't have to trust others 
(companies).

> As you know trying to comply with the laws of every country is nearly
> impossible.  If we try to do that, then we'll immediately run into

That's not the point. Of course we cannot follow every law as you also 
cannot satisfy everybody's favorite feature. But we could go with a law 
that has a great protection.

> problems, like the status of Taiwan (Chinese Formosa), which has come
> up previously:
>
> http://openoffice.org/projects/www/lists/discuss/archive/2003-06/message/38

Could be easily solved when using the term "Chinese (Taiwan)" or better 
"Taiwanese". But this doesn't matter here.

>>> Storing the data ourselves is a double-edged sword.  If we store it,
>>> then we are responsible for any problems with that data.
>>
>> I don't think that would be more difficult than what Apache is storing
>> anyway (mail addresses, user names, passwords). I don't think that we would
>> be interested in IP addresses, postal addresses, etc.
>>
>
> Any web analytics package is going to track IP address and store a
> cookie.  That is how it knows what country you are from and whether
> you are a new or a returning user.

But there is a difference if you track and analyze the IP address (e.g., 
via a GeoIP library) but store only the country or if you store the 
whole IP address. ;-)

> I agree that it is not much more difficult.  If we use Google, then we
> need to secure and control access to the login for Google Analytics.
> If we use Piwik then we need to control access there.  And if we just
> use web logs and run reports on those, then we need to control access
> to the raw http logs.

Yes, so we should discuss this in more details and should really decide 
on concensus.

> For any of these options, we'll have some information that we need to
> keep secure.   The PPMC has the ability to do this, via a private area
> in SVN.
>
>> The main part would be to know the user's browser data (OS, language,
>> browser app and version). For me no special data that should get special
>> treated.
>>
>>> Google states what they can do with the data, but it is rather broad,
>>> as you know.
>>
>> When you are really concerned about protection of private data, then you
>> wouldn't use Google Analytics. ;-)
>>
>
> Or you would disable cookies and Javascript from your browser, right?

Maybe. But I doubt that it would give you a real protection against 
analytics methods.

> Actually, that is a great goal for this project:  We should try to
> make sure that our website, downloads, etc., all work, even if
> Javascript and cookies are disabled.  This is a good thing for
> accessibility as well.

That's what we've already done on the old project.

>>>> [1] https://www.datenschutzzentrum.de/tracking/
>>>> [2]
>>>> https://www.datenschutzzentrum.de/tracking/20090123_GA_stellungnahme.pdf
>>>> [3] https://www.datenschutzzentrum.de/tracking/piwik/
>>>>
>>>>   Eike
>>
>> Marcus

Marcus

Re: [WWW] Web analytics

Posted by Rob Weir <ap...@robweir.com>.
On Fri, Aug 12, 2011 at 5:48 PM, Marcus (OOo) <ma...@wtnet.de> wrote:
> Am 08/12/2011 10:42 PM, schrieb Rob Weir:
>>
>> On Fri, Aug 12, 2011 at 4:14 PM, Eike Rathke<oo...@erack.de>  wrote:
>>>
>>> Hi Rob,
>>>
>>> On Friday, 2011-08-12 13:29:00 -0400, Rob Weir wrote:
>>>
>>>>> Before taking that step, it's worth asking if the project actually
>>>>> has a need for web analytics yet. They were included on OO.o site
>>>>> mainly because Sun was using the data as part of its business
>>>>> metrics. It's not obvious that the same need exists in AOOo.
>>>>
>>>> I think it is an essential tool to optimizing the web experience for
>>>> our visitors.  It is part of a feedback loop where we look at the
>>>> traffic stats, how our website is actually being used, the
>>>> demographics of the visitors, etc., and then iteratively improve the
>>>> website to make it more useful.
>>>
>>> So first question is: analytics yes or no, which affects also the
>>> Privacy Policy.
>>>
>>>> On the question of Piwik (open source, used, for example by
>>>> LibreOffice) versus Google Analytics,  I'm very familiar with Google,
>>>> so I could help more there.  But I don't have an informed opinion on
>>>> the virtues of each.  I've never heard of Piwik until today.
>>>
>>> The big difference is that with Piwik the data collected stays inhouse
>>> at Apache, whereas with Google it goes to Google that does whatever you
>>> don't know. This again implies that at Apache measures must be taken to
>>> protect the privacy of collected data. The German "Landeszentrum für
>>> Datenschutz Schleswig-Holstein" (center of data protection) has a few
>>> documents about tracking [1], unfortunately only in German, why Google
>>> Analytics doesn't comply with the German data protection law [2] and how
>>> Piwik can be configured to be used in compliance with the law [3].
>>>
>>
>> Does this law matter if the servers are hosted in the US, not in
>> Germany?  (I'm assuming that the Apache servers are in the US).
>
> No, but it not a secret that the protection of private data is, hm, not the
> best in the US compared with other. So, why stick with this?
>

Remember, even if we used Piwik, the data would be in the US.  All
user accounts for Apache, all wiki accounts, all mailing lists
subscription data, etc., is in the US.  We have a jurisdiction.

As you know trying to comply with the laws of every country is nearly
impossible.  If we try to do that, then we'll immediately run into
problems, like the status of Taiwan (Chinese Formosa), which has come
up previously:

http://openoffice.org/projects/www/lists/discuss/archive/2003-06/message/38

>> Storing the data ourselves is a double-edged sword.  If we store it,
>> then we are responsible for any problems with that data.
>
> I don't think that would be more difficult than what Apache is storing
> anyway (mail addresses, user names, passwords). I don't think that we would
> be interested in IP addresses, postal addresses, etc.
>

Any web analytics package is going to track IP address and store a
cookie.  That is how it knows what country you are from and whether
you are a new or a returning user.

I agree that it is not much more difficult.  If we use Google, then we
need to secure and control access to the login for Google Analytics.
If we use Piwik then we need to control access there.  And if we just
use web logs and run reports on those, then we need to control access
to the raw http logs.

For any of these options, we'll have some information that we need to
keep secure.   The PPMC has the ability to do this, via a private area
in SVN.

> The main part would be to know the user's browser data (OS, language,
> browser app and version). For me no special data that should get special
> treated.
>
>> Google states what they can do with the data, but it is rather broad,
>> as you know.
>
> When you are really concerned about protection of private data, then you
> wouldn't use Google Analytics. ;-)
>

Or you would disable cookies and Javascript from your browser, right?

Actually, that is a great goal for this project:  We should try to
make sure that our website, downloads, etc., all work, even if
Javascript and cookies are disabled.  This is a good thing for
accessibility as well.

>>> [1] https://www.datenschutzzentrum.de/tracking/
>>> [2]
>>> https://www.datenschutzzentrum.de/tracking/20090123_GA_stellungnahme.pdf
>>> [3] https://www.datenschutzzentrum.de/tracking/piwik/
>>>
>>>  Eike
>
> Marcus
>

Re: [WWW] Web analytics

Posted by Simon Phipps <si...@webmink.com>.
On 13 Aug 2011, at 01:05, drew wrote:

> On Fri, 2011-08-12 at 15:50 -0700, Kay Schenk wrote:
>> 
>> On 08/12/2011 02:48 PM, Marcus (OOo) wrote:
>> <<--snipped-->
>>> 
>>> The main part would be to know the user's browser data (OS, language,
>>> browser app and version). For me no special data that should get special
>>> treated.
>> 
>> If all we want is something like what is stated above -- there are many 
>> packages we could install/use. AwStats comes to mind. Data would be 
>> stored locally.
> 
> Right - that just needs the access logs.

Personally I'd consider the server logs to be a reasonable level of default data-gathering in the absence of concrete requirements. Are they gathered by default on sites here or do we have to ask for the feature to be enabled? 

S.


Re: [WWW] Web analytics

Posted by drew <dr...@baseanswers.com>.
On Fri, 2011-08-12 at 15:50 -0700, Kay Schenk wrote:
> 
> On 08/12/2011 02:48 PM, Marcus (OOo) wrote:
> <<--snipped-->
> >
> > The main part would be to know the user's browser data (OS, language,
> > browser app and version). For me no special data that should get special
> > treated.
> 
> If all we want is something like what is stated above -- there are many 
> packages we could install/use. AwStats comes to mind. Data would be 
> stored locally.

Right - that just needs the access logs.

OK - let me pour a little concrete onto the abstract discussion.

First - by now you know I'm late getting the email to the legal list for
TOU guidance, but I have it started now and should go tonight.

So to the 'ready-mix' - if I understand correctly, and I hope I'm not
speaking out of class here, legally we have the right to re-brand the
extension/template site currently.

Since that site is hosted by a third party (OSU) all it would take is a
change to the footer (Header ?) of the Drupal template - shouldn't be
that big a thing. (especially when someone else is doing it, yes) I
doubt there is access to access logs from here, well I don't know that
for a fact, but I'd be surprised.

**To the mentors, I'm not suggesting the adimn run and do that this
minute.. :-)

Short answer for me, I would gather basic demographics on any such site.

- for the long answer, will catch up in-line with other emails on the
list.

Thanks,

//drew



Re: [WWW] Web analytics

Posted by Kay Schenk <ka...@gmail.com>.

On 08/12/2011 02:48 PM, Marcus (OOo) wrote:
<<--snipped-->
>
> The main part would be to know the user's browser data (OS, language,
> browser app and version). For me no special data that should get special
> treated.

If all we want is something like what is stated above -- there are many 
packages we could install/use. AwStats comes to mind. Data would be 
stored locally.

>
>> Google states what they can do with the data, but it is rather broad,
>> as you know.
>
> When you are really concerned about protection of private data, then you
> wouldn't use Google Analytics. ;-)
>

<--snipped -->
>
> Marcus

-- 
------------------------------------------------------------------------
MzK

"Those who love deeply never grow old;
  they may die of old age, but they die young."
                         -- Sir Arthur Pinero

Re: [WWW] Web analytics

Posted by "Marcus (OOo)" <ma...@wtnet.de>.
Am 08/12/2011 10:42 PM, schrieb Rob Weir:
> On Fri, Aug 12, 2011 at 4:14 PM, Eike Rathke<oo...@erack.de>  wrote:
>> Hi Rob,
>>
>> On Friday, 2011-08-12 13:29:00 -0400, Rob Weir wrote:
>>
>>>> Before taking that step, it's worth asking if the project actually
>>>> has a need for web analytics yet. They were included on OO.o site
>>>> mainly because Sun was using the data as part of its business
>>>> metrics. It's not obvious that the same need exists in AOOo.
>>>
>>> I think it is an essential tool to optimizing the web experience for
>>> our visitors.  It is part of a feedback loop where we look at the
>>> traffic stats, how our website is actually being used, the
>>> demographics of the visitors, etc., and then iteratively improve the
>>> website to make it more useful.
>>
>> So first question is: analytics yes or no, which affects also the
>> Privacy Policy.
>>
>>> On the question of Piwik (open source, used, for example by
>>> LibreOffice) versus Google Analytics,  I'm very familiar with Google,
>>> so I could help more there.  But I don't have an informed opinion on
>>> the virtues of each.  I've never heard of Piwik until today.
>>
>> The big difference is that with Piwik the data collected stays inhouse
>> at Apache, whereas with Google it goes to Google that does whatever you
>> don't know. This again implies that at Apache measures must be taken to
>> protect the privacy of collected data. The German "Landeszentrum für
>> Datenschutz Schleswig-Holstein" (center of data protection) has a few
>> documents about tracking [1], unfortunately only in German, why Google
>> Analytics doesn't comply with the German data protection law [2] and how
>> Piwik can be configured to be used in compliance with the law [3].
>>
>
> Does this law matter if the servers are hosted in the US, not in
> Germany?  (I'm assuming that the Apache servers are in the US).

No, but it not a secret that the protection of private data is, hm, not 
the best in the US compared with other. So, why stick with this?

> Storing the data ourselves is a double-edged sword.  If we store it,
> then we are responsible for any problems with that data.

I don't think that would be more difficult than what Apache is storing 
anyway (mail addresses, user names, passwords). I don't think that we 
would be interested in IP addresses, postal addresses, etc.

The main part would be to know the user's browser data (OS, language, 
browser app and version). For me no special data that should get special 
treated.

> Google states what they can do with the data, but it is rather broad,
> as you know.

When you are really concerned about protection of private data, then you 
wouldn't use Google Analytics. ;-)

>> [1] https://www.datenschutzzentrum.de/tracking/
>> [2] https://www.datenschutzzentrum.de/tracking/20090123_GA_stellungnahme.pdf
>> [3] https://www.datenschutzzentrum.de/tracking/piwik/
>>
>>   Eike

Marcus

Re: [WWW] Web analytics

Posted by Rob Weir <ap...@robweir.com>.
On Fri, Aug 12, 2011 at 4:14 PM, Eike Rathke <oo...@erack.de> wrote:
> Hi Rob,
>
> On Friday, 2011-08-12 13:29:00 -0400, Rob Weir wrote:
>
>> > Before taking that step, it's worth asking if the project actually
>> > has a need for web analytics yet. They were included on OO.o site
>> > mainly because Sun was using the data as part of its business
>> > metrics. It's not obvious that the same need exists in AOOo.
>>
>> I think it is an essential tool to optimizing the web experience for
>> our visitors.  It is part of a feedback loop where we look at the
>> traffic stats, how our website is actually being used, the
>> demographics of the visitors, etc., and then iteratively improve the
>> website to make it more useful.
>
> So first question is: analytics yes or no, which affects also the
> Privacy Policy.
>
>> On the question of Piwik (open source, used, for example by
>> LibreOffice) versus Google Analytics,  I'm very familiar with Google,
>> so I could help more there.  But I don't have an informed opinion on
>> the virtues of each.  I've never heard of Piwik until today.
>
> The big difference is that with Piwik the data collected stays inhouse
> at Apache, whereas with Google it goes to Google that does whatever you
> don't know. This again implies that at Apache measures must be taken to
> protect the privacy of collected data. The German "Landeszentrum für
> Datenschutz Schleswig-Holstein" (center of data protection) has a few
> documents about tracking [1], unfortunately only in German, why Google
> Analytics doesn't comply with the German data protection law [2] and how
> Piwik can be configured to be used in compliance with the law [3].
>

Does this law matter if the servers are hosted in the US, not in
Germany?  (I'm assuming that the Apache servers are in the US).

Storing the data ourselves is a double-edged sword.  If we store it,
then we are responsible for any problems with that data.

Google states what they can do with the data, but it is rather broad,
as you know.

> [1] https://www.datenschutzzentrum.de/tracking/
> [2] https://www.datenschutzzentrum.de/tracking/20090123_GA_stellungnahme.pdf
> [3] https://www.datenschutzzentrum.de/tracking/piwik/
>
>  Eike
>
> --
>  PGP/OpenPGP/GnuPG encrypted mail preferred in all private communication.
>  Key ID: 0x293C05FD - 997A 4C60 CE41 0149 0DB3  9E96 2F1A D073 293C 05FD
>

Re: [WWW] Web analytics

Posted by Rob Weir <ap...@robweir.com>.
On Fri, Aug 12, 2011 at 5:47 PM, Marcus (OOo) <ma...@wtnet.de> wrote:
> Am 08/12/2011 11:01 PM, schrieb Rob Weir:
>>
>> On Fri, Aug 12, 2011 at 4:30 PM, Simon Phipps<si...@webmink.com>  wrote:
>>>
>>> On 12 Aug 2011, at 21:14, Eike Rathke wrote:
>>>
>>>> So first question is: analytics yes or no, which affects also the
>>>> Privacy Policy.
>>>
>>> I suggest the right question is "which project members need which data
>>> and why". The answer today may well be "none", since we don't actually have
>>> any resources to visit yet. This is also likely to change over time, and
>>> we'll need to add analytics as and when people request (and justify)
>>> according to their specific needs and remove them when they're no longer
>>> justified.
>>>
>>> I suggest we resist the idea of capturing bulk analytics "just because",
>>> and instead devise a lightweight process for justifying and requesting
>>> collection of data. I'd guess there is already a process to copy somewhere
>>> in Apache - any mentors with suggestions where to look?
>>>
>>
>> I see that tracking code is used with the websites of most of the
>> groups you are affiliated with:  LibreOffice (Piwik), ForgeRock
>> (Google Analytics, including in the community pages) and OSI  (Google
>> Analytics).  And as was mentioned before, OpenOffice.org uses Google
>> Analytics currently.
>
> Really, your agrumentation is somewhat strange. Sometimes it's "we are now
> at a new home, so we have to do it different" and sometimes "in the old home
> it was done this way, so let's keep it this way". How do you decide what to
> prefer?
>

I try to make the best recommendation in each case.  I don't think we
should avoid past practice just for the sake of it, but neither should
we be tied to the past.  We should make the best choice given the
opportunities available to us today.

> Marcus
>
>
>
>> Have you given them similar advice?  Or is there something special
>> about OpenOffice at Apache that suggests that we should not be
>> optimizing our website based on visitor stats like others, including
>> LibreOffice, are?
>>
>>
>> -Rob
>>
>>> S.
>

Re: [WWW] Web analytics

Posted by "Marcus (OOo)" <ma...@wtnet.de>.
Am 08/12/2011 11:01 PM, schrieb Rob Weir:
> On Fri, Aug 12, 2011 at 4:30 PM, Simon Phipps<si...@webmink.com>  wrote:
>>
>> On 12 Aug 2011, at 21:14, Eike Rathke wrote:
>>
>>> So first question is: analytics yes or no, which affects also the
>>> Privacy Policy.
>>
>> I suggest the right question is "which project members need which data and why". The answer today may well be "none", since we don't actually have any resources to visit yet. This is also likely to change over time, and we'll need to add analytics as and when people request (and justify) according to their specific needs and remove them when they're no longer justified.
>>
>> I suggest we resist the idea of capturing bulk analytics "just because", and instead devise a lightweight process for justifying and requesting collection of data. I'd guess there is already a process to copy somewhere in Apache - any mentors with suggestions where to look?
>>
>
> I see that tracking code is used with the websites of most of the
> groups you are affiliated with:  LibreOffice (Piwik), ForgeRock
> (Google Analytics, including in the community pages) and OSI  (Google
> Analytics).  And as was mentioned before, OpenOffice.org uses Google
> Analytics currently.

Really, your agrumentation is somewhat strange. Sometimes it's "we are 
now at a new home, so we have to do it different" and sometimes "in the 
old home it was done this way, so let's keep it this way". How do you 
decide what to prefer?

Marcus



> Have you given them similar advice?  Or is there something special
> about OpenOffice at Apache that suggests that we should not be
> optimizing our website based on visitor stats like others, including
> LibreOffice, are?
>
>
> -Rob
>
>> S.

Re: [WWW] Web analytics

Posted by "Marcus (OOo)" <ma...@wtnet.de>.
+1

I think, too, we need some data to service the users needs better. And 
we should collect as less data as possible and at a specific time when 
we really need it.

Marcus



Am 08/12/2011 11:35 PM, schrieb Dennis E. Hamilton:
> +1
>
> on no data collected before we have a clear need for it and are prepared to deal with it responsibly
>
>   - Dennis
>
> -----Original Message-----
> From: Simon Phipps [mailto:simon@webmink.com]
> Sent: Friday, August 12, 2011 14:15
> To: ooo-dev@incubator.apache.org
> Subject: Re: [WWW] Web analytics
>
>
> On 12 Aug 2011, at 22:01, Rob Weir wrote:
>
>> On Fri, Aug 12, 2011 at 4:30 PM, Simon Phipps<si...@webmink.com>  wrote:
>>>
>>> I suggest the right question is "which project members need which data and why". The answer today may well be "none", since we don't actually have any resources to visit yet. This is also likely to change over time, and we'll need to add analytics as and when people request (and justify) according to their specific needs and remove them when they're no longer justified.
>>>
>>> I suggest we resist the idea of capturing bulk analytics "just because", and instead devise a lightweight process for justifying and requesting collection of data. I'd guess there is already a process to copy somewhere in Apache - any mentors with suggestions where to look?
>>>
>>
>> I see that tracking code is used with the websites of most of the
>> groups you are affiliated with:  LibreOffice (Piwik), ForgeRock
>> (Google Analytics, including in the community pages) and OSI  (Google
>> Analytics).  And as was mentioned before, OpenOffice.org uses Google
>> Analytics currently.
>
> It is indeed endemic. We have a unique opportunity to address the issue thoughtfully. And by the way I am delighted you're paying such close attention to my career.
>
>>
>> Have you given them similar advice?
>
> Where possible, yes. Abuse of personal data is something which concerns me greatly.
>
>> Or is there something special
>> about OpenOffice at Apache
>
> Yes. At the moment as far as I am aware AOOo has no significant resources of interest to non-project-members and no groups of members with active applications for the data from analytics. Both situations will certainly change, but on best YAGNI principles I suggest doing what's needed when it's needed, on the basis of actual documented requirements.
>
>> that suggests that we should not be
>> optimizing our website based on visitor stats like others, including
>> LibreOffice, are?
>
> When we have an end-user website capable of optimisation along with project members stepping forward to harvest the data, process it and act upon it, that will be a fine thing to do. What's needed, when it's needed.
>
> S.

Re: [WWW] Web analytics

Posted by Rob Weir <ap...@robweir.com>.
On Fri, Aug 12, 2011 at 5:35 PM, Dennis E. Hamilton
<de...@acm.org> wrote:
> +1
>
> on no data collected before we have a clear need for it and are prepared to deal with it responsibly
>

I agree, but I think we will want to enable analytics as soon as the
new wiki and website are live.  So we should get this into the privacy
policy from the start.

If you know anything about web analytics, you know that this is not
something where you ask a question today, turn it on, and have the
answer an hour later.  It requires that you collect the data over a
sustained period of time.  If you want to make informed decisions, and
have a statistically sound basis for using the data, you need to have
collected it in advance.  You are typically asking what the effect of
a change has been on access patterns.  So you need baseline data, as
well as post-change data.  And often you want retrospective data to
inform a decision today.

Remember, we're not collecting any personally identifying information
here.  It is aggregate information about what countries vistors are
coming from, at what times of the day, what pages are most frequently
visited,  how long they are lingering on various pages, what websites
are referring the most visitors, what browsers most of them are using,
etc.

>  - Dennis
>
> -----Original Message-----
> From: Simon Phipps [mailto:simon@webmink.com]
> Sent: Friday, August 12, 2011 14:15
> To: ooo-dev@incubator.apache.org
> Subject: Re: [WWW] Web analytics
>
>
> On 12 Aug 2011, at 22:01, Rob Weir wrote:
>
>> On Fri, Aug 12, 2011 at 4:30 PM, Simon Phipps <si...@webmink.com> wrote:
>>>
>>> I suggest the right question is "which project members need which data and why". The answer today may well be "none", since we don't actually have any resources to visit yet. This is also likely to change over time, and we'll need to add analytics as and when people request (and justify) according to their specific needs and remove them when they're no longer justified.
>>>
>>> I suggest we resist the idea of capturing bulk analytics "just because", and instead devise a lightweight process for justifying and requesting collection of data. I'd guess there is already a process to copy somewhere in Apache - any mentors with suggestions where to look?
>>>
>>
>> I see that tracking code is used with the websites of most of the
>> groups you are affiliated with:  LibreOffice (Piwik), ForgeRock
>> (Google Analytics, including in the community pages) and OSI  (Google
>> Analytics).  And as was mentioned before, OpenOffice.org uses Google
>> Analytics currently.
>
> It is indeed endemic. We have a unique opportunity to address the issue thoughtfully. And by the way I am delighted you're paying such close attention to my career.
>
>>
>> Have you given them similar advice?
>
> Where possible, yes. Abuse of personal data is something which concerns me greatly.
>
>> Or is there something special
>> about OpenOffice at Apache
>
> Yes. At the moment as far as I am aware AOOo has no significant resources of interest to non-project-members and no groups of members with active applications for the data from analytics. Both situations will certainly change, but on best YAGNI principles I suggest doing what's needed when it's needed, on the basis of actual documented requirements.
>
>> that suggests that we should not be
>> optimizing our website based on visitor stats like others, including
>> LibreOffice, are?
>
> When we have an end-user website capable of optimisation along with project members stepping forward to harvest the data, process it and act upon it, that will be a fine thing to do. What's needed, when it's needed.
>
> S.
>
>
>
>

RE: [WWW] Web analytics

Posted by "Dennis E. Hamilton" <de...@acm.org>.
+1 

on no data collected before we have a clear need for it and are prepared to deal with it responsibly

 - Dennis

-----Original Message-----
From: Simon Phipps [mailto:simon@webmink.com] 
Sent: Friday, August 12, 2011 14:15
To: ooo-dev@incubator.apache.org
Subject: Re: [WWW] Web analytics


On 12 Aug 2011, at 22:01, Rob Weir wrote:

> On Fri, Aug 12, 2011 at 4:30 PM, Simon Phipps <si...@webmink.com> wrote:
>> 
>> I suggest the right question is "which project members need which data and why". The answer today may well be "none", since we don't actually have any resources to visit yet. This is also likely to change over time, and we'll need to add analytics as and when people request (and justify) according to their specific needs and remove them when they're no longer justified.
>> 
>> I suggest we resist the idea of capturing bulk analytics "just because", and instead devise a lightweight process for justifying and requesting collection of data. I'd guess there is already a process to copy somewhere in Apache - any mentors with suggestions where to look?
>> 
> 
> I see that tracking code is used with the websites of most of the
> groups you are affiliated with:  LibreOffice (Piwik), ForgeRock
> (Google Analytics, including in the community pages) and OSI  (Google
> Analytics).  And as was mentioned before, OpenOffice.org uses Google
> Analytics currently.

It is indeed endemic. We have a unique opportunity to address the issue thoughtfully. And by the way I am delighted you're paying such close attention to my career.

> 
> Have you given them similar advice?  

Where possible, yes. Abuse of personal data is something which concerns me greatly.

> Or is there something special
> about OpenOffice at Apache

Yes. At the moment as far as I am aware AOOo has no significant resources of interest to non-project-members and no groups of members with active applications for the data from analytics. Both situations will certainly change, but on best YAGNI principles I suggest doing what's needed when it's needed, on the basis of actual documented requirements. 

> that suggests that we should not be
> optimizing our website based on visitor stats like others, including
> LibreOffice, are?

When we have an end-user website capable of optimisation along with project members stepping forward to harvest the data, process it and act upon it, that will be a fine thing to do. What's needed, when it's needed.

S.




Re: [WWW] Web analytics

Posted by Rob Weir <ap...@robweir.com>.
On Fri, Aug 12, 2011 at 5:57 PM, TJ Frazier <tj...@cfl.rr.com> wrote:
> On 8/12/2011 17:15, Simon Phipps wrote:
>>
>> On 12 Aug 2011, at 22:01, Rob Weir wrote:
>>
>>> On Fri, Aug 12, 2011 at 4:30 PM, Simon Phipps<si...@webmink.com>  wrote:
>>>>
>>>> I suggest the right question is "which project members need which data
>>>> and why". The answer today may well be "none", since we don't actually have
>>>> any resources to visit yet. This is also likely to change over time, and
>>>> we'll need to add analytics as and when people request (and justify)
>>>> according to their specific needs and remove them when they're no longer
>>>> justified.
>>>>
>>>> I suggest we resist the idea of capturing bulk analytics "just because",
>>>> and instead devise a lightweight process for justifying and requesting
>>>> collection of data. I'd guess there is already a process to copy somewhere
>>>> in Apache - any mentors with suggestions where to look?
>>>>
>>>
>>> I see that tracking code is used with the websites of most of the
>>> groups you are affiliated with:  LibreOffice (Piwik), ForgeRock
>>> (Google Analytics, including in the community pages) and OSI  (Google
>>> Analytics).  And as was mentioned before, OpenOffice.org uses Google
>>> Analytics currently.
>>
>> It is indeed endemic. We have a unique opportunity to address the issue
>> thoughtfully. And by the way I am delighted you're paying such close
>> attention to my career.
>>
>>>
>>> Have you given them similar advice?
>>
>> Where possible, yes. Abuse of personal data is something which concerns me
>> greatly.
>>
>>> Or is there something special
>>> about OpenOffice at Apache
>>
>> Yes. At the moment as far as I am aware AOOo has no significant resources
>> of interest to non-project-members and no groups of members with active
>> applications for the data from analytics. Both situations will certainly
>> change, but on best YAGNI principles I suggest doing what's needed when it's
>> needed, on the basis of actual documented requirements.
>>
>>> that suggests that we should not be
>>> optimizing our website based on visitor stats like others, including
>>> LibreOffice, are?
>>
>> When we have an end-user website capable of optimisation along with
>> project members stepping forward to harvest the data, process it and act
>> upon it, that will be a fine thing to do. What's needed, when it's needed.
>>
>> S.
>>
> Two points, not definitive but worth considering:
>
> 1) Technically, is it easier to build in analytics now, or would it be just
> as easy to add them later?
>

Slight advantage to adding it now:

A) We're already updating page footers on the website as part of the
re-branding effort.  This is where the tracking code typically goes.

B) The plan is to put together new terms of use and privacy policy
pages, and send them, along with the new headers/footers, to review by
Apache Legal Affairs and Apache Branding.

But adding them later is not technically difficult.  What you do lose
is the value of the data you had not collected.  You can never get
that back.

> 2) Might we want to do something dramatic (<sarcasm> say, actually make a
> release... </sarcasm>), and measure the effect on the site? Suddenly, months
> of un-analyzed data become a valuable baseline.
>

I think baselines are very valuable.  Another way they work is by
Google storing a cookie on your machine.  So you can tell who is a
new, first-time visitor versus one who has been here 20 times before.
So when we do a new release, we can look at the different behaviors
for these two different groups.  Remember, the website is trying to
service many different kinds of users: first time, repeat visitors,
power users, project members, etc.  Each of them has different
patterns of use.

It is part of a data-driven approach.  Good web design and the
principles of User Centered Design will get you far.  You can make a
good website knowing nothing more than basic principles and an image
in your mind of how the users will interact with it.  But to make a
great website you need more than that.  You need to know how users
actually behave.

> $0.02
> --
> /tj/
>
>

Re: [WWW] Web analytics

Posted by TJ Frazier <tj...@cfl.rr.com>.
On 8/12/2011 17:15, Simon Phipps wrote:
>
> On 12 Aug 2011, at 22:01, Rob Weir wrote:
>
>> On Fri, Aug 12, 2011 at 4:30 PM, Simon Phipps<si...@webmink.com>  wrote:
>>>
>>> I suggest the right question is "which project members need which data and why". The answer today may well be "none", since we don't actually have any resources to visit yet. This is also likely to change over time, and we'll need to add analytics as and when people request (and justify) according to their specific needs and remove them when they're no longer justified.
>>>
>>> I suggest we resist the idea of capturing bulk analytics "just because", and instead devise a lightweight process for justifying and requesting collection of data. I'd guess there is already a process to copy somewhere in Apache - any mentors with suggestions where to look?
>>>
>>
>> I see that tracking code is used with the websites of most of the
>> groups you are affiliated with:  LibreOffice (Piwik), ForgeRock
>> (Google Analytics, including in the community pages) and OSI  (Google
>> Analytics).  And as was mentioned before, OpenOffice.org uses Google
>> Analytics currently.
>
> It is indeed endemic. We have a unique opportunity to address the issue thoughtfully. And by the way I am delighted you're paying such close attention to my career.
>
>>
>> Have you given them similar advice?
>
> Where possible, yes. Abuse of personal data is something which concerns me greatly.
>
>> Or is there something special
>> about OpenOffice at Apache
>
> Yes. At the moment as far as I am aware AOOo has no significant resources of interest to non-project-members and no groups of members with active applications for the data from analytics. Both situations will certainly change, but on best YAGNI principles I suggest doing what's needed when it's needed, on the basis of actual documented requirements.
>
>> that suggests that we should not be
>> optimizing our website based on visitor stats like others, including
>> LibreOffice, are?
>
> When we have an end-user website capable of optimisation along with project members stepping forward to harvest the data, process it and act upon it, that will be a fine thing to do. What's needed, when it's needed.
>
> S.
>
Two points, not definitive but worth considering:

1) Technically, is it easier to build in analytics now, or would it be 
just as easy to add them later?

2) Might we want to do something dramatic (<sarcasm> say, actually make 
a release... </sarcasm>), and measure the effect on the site? Suddenly, 
months of un-analyzed data become a valuable baseline.

$0.02
-- 
/tj/


Re: [WWW] Web analytics

Posted by Simon Phipps <si...@webmink.com>.
On Fri, Aug 12, 2011 at 11:13 PM, Rob Weir <ap...@robweir.com> wrote:

<snip />

Frankly Rob most of what you said about me is none of your business as well
as based on false assumptions and partial information. Since I'm the only
person you are attempting to discredit, I assume you have another reason for
using this tactic. This is more suited to adversarial debate in a standards
group than collaboration in an open source project and I'd ask you to stop.

> Could you explain how this could possibly benefit the project?

What, asking for use cases and requirements before implementing? Can you
explain why you oppose that?

S.

Re: [WWW] Web analytics

Posted by Rob Weir <ap...@robweir.com>.
On Fri, Aug 12, 2011 at 5:15 PM, Simon Phipps <si...@webmink.com> wrote:
>
> On 12 Aug 2011, at 22:01, Rob Weir wrote:
>
>> On Fri, Aug 12, 2011 at 4:30 PM, Simon Phipps <si...@webmink.com> wrote:
>>>
>>> I suggest the right question is "which project members need which data and why". The answer today may well be "none", since we don't actually have any resources to visit yet. This is also likely to change over time, and we'll need to add analytics as and when people request (and justify) according to their specific needs and remove them when they're no longer justified.
>>>
>>> I suggest we resist the idea of capturing bulk analytics "just because", and instead devise a lightweight process for justifying and requesting collection of data. I'd guess there is already a process to copy somewhere in Apache - any mentors with suggestions where to look?
>>>
>>
>> I see that tracking code is used with the websites of most of the
>> groups you are affiliated with:  LibreOffice (Piwik), ForgeRock
>> (Google Analytics, including in the community pages) and OSI  (Google
>> Analytics).  And as was mentioned before, OpenOffice.org uses Google
>> Analytics currently.
>
> It is indeed endemic. We have a unique opportunity to address the issue thoughtfully. And by the way I am delighted you're paying such close attention to my career.
>
>>
>> Have you given them similar advice?
>
> Where possible, yes. Abuse of personal data is something which concerns me greatly.
>

It concerns you greatly?  Well, what about your personal blog then,
http://www.webmink.net/?  I see you have tracking code for Google
Analytics there as well:

<script src="http://www.google-analytics.com/urchin.js" type="text/javascript">
</script>
<script type="text/javascript">
_uacct = "UA-271327-1";
urchinTracker();
</script>


Have you talked to your web admin about that?

I've very skeptical.  You appear to be expressing dread about a
practice that you seem never to have objected to previously, not in
your personal blog nor with other websites that you are associated
with, even the most closely analogous website, LibreOffice.org.

If we take your advice, we will be at a distinct disadvantage in our
ability to optimize our website based on usage patterns.

Could you explain how this could possibly benefit the project?


>> Or is there something special
>> about OpenOffice at Apache
>
> Yes. At the moment as far as I am aware AOOo has no significant resources of interest to non-project-members and no groups of members with active applications for the data from analytics. Both situations will certainly change, but on best YAGNI principles I suggest doing what's needed when it's needed, on the basis of actual documented requirements.
>
>> that suggests that we should not be
>> optimizing our website based on visitor stats like others, including
>> LibreOffice, are?
>
> When we have an end-user website capable of optimisation along with project members stepping forward to harvest the data, process it and act upon it, that will be a fine thing to do. What's needed, when it's needed.
>
> S.
>
>
>
>

Re: [WWW] Web analytics

Posted by Simon Phipps <si...@webmink.com>.
On 12 Aug 2011, at 22:01, Rob Weir wrote:

> On Fri, Aug 12, 2011 at 4:30 PM, Simon Phipps <si...@webmink.com> wrote:
>> 
>> I suggest the right question is "which project members need which data and why". The answer today may well be "none", since we don't actually have any resources to visit yet. This is also likely to change over time, and we'll need to add analytics as and when people request (and justify) according to their specific needs and remove them when they're no longer justified.
>> 
>> I suggest we resist the idea of capturing bulk analytics "just because", and instead devise a lightweight process for justifying and requesting collection of data. I'd guess there is already a process to copy somewhere in Apache - any mentors with suggestions where to look?
>> 
> 
> I see that tracking code is used with the websites of most of the
> groups you are affiliated with:  LibreOffice (Piwik), ForgeRock
> (Google Analytics, including in the community pages) and OSI  (Google
> Analytics).  And as was mentioned before, OpenOffice.org uses Google
> Analytics currently.

It is indeed endemic. We have a unique opportunity to address the issue thoughtfully. And by the way I am delighted you're paying such close attention to my career.

> 
> Have you given them similar advice?  

Where possible, yes. Abuse of personal data is something which concerns me greatly.

> Or is there something special
> about OpenOffice at Apache

Yes. At the moment as far as I am aware AOOo has no significant resources of interest to non-project-members and no groups of members with active applications for the data from analytics. Both situations will certainly change, but on best YAGNI principles I suggest doing what's needed when it's needed, on the basis of actual documented requirements. 

> that suggests that we should not be
> optimizing our website based on visitor stats like others, including
> LibreOffice, are?

When we have an end-user website capable of optimisation along with project members stepping forward to harvest the data, process it and act upon it, that will be a fine thing to do. What's needed, when it's needed.

S.




Re: [WWW] Web analytics

Posted by Rob Weir <ap...@robweir.com>.
On Fri, Aug 12, 2011 at 4:30 PM, Simon Phipps <si...@webmink.com> wrote:
>
> On 12 Aug 2011, at 21:14, Eike Rathke wrote:
>
>> So first question is: analytics yes or no, which affects also the
>> Privacy Policy.
>
> I suggest the right question is "which project members need which data and why". The answer today may well be "none", since we don't actually have any resources to visit yet. This is also likely to change over time, and we'll need to add analytics as and when people request (and justify) according to their specific needs and remove them when they're no longer justified.
>
> I suggest we resist the idea of capturing bulk analytics "just because", and instead devise a lightweight process for justifying and requesting collection of data. I'd guess there is already a process to copy somewhere in Apache - any mentors with suggestions where to look?
>

I see that tracking code is used with the websites of most of the
groups you are affiliated with:  LibreOffice (Piwik), ForgeRock
(Google Analytics, including in the community pages) and OSI  (Google
Analytics).  And as was mentioned before, OpenOffice.org uses Google
Analytics currently.

Have you given them similar advice?  Or is there something special
about OpenOffice at Apache that suggests that we should not be
optimizing our website based on visitor stats like others, including
LibreOffice, are?


-Rob

> S.
>
>

Re: [WWW] Web analytics

Posted by Simon Phipps <si...@webmink.com>.
On 12 Aug 2011, at 21:14, Eike Rathke wrote:

> So first question is: analytics yes or no, which affects also the
> Privacy Policy.

I suggest the right question is "which project members need which data and why". The answer today may well be "none", since we don't actually have any resources to visit yet. This is also likely to change over time, and we'll need to add analytics as and when people request (and justify) according to their specific needs and remove them when they're no longer justified. 

I suggest we resist the idea of capturing bulk analytics "just because", and instead devise a lightweight process for justifying and requesting collection of data. I'd guess there is already a process to copy somewhere in Apache - any mentors with suggestions where to look?

S.


Re: [WWW] Web analytics

Posted by Eike Rathke <oo...@erack.de>.
Hi Rob,

On Friday, 2011-08-12 13:29:00 -0400, Rob Weir wrote:

> > Before taking that step, it's worth asking if the project actually
> > has a need for web analytics yet. They were included on OO.o site
> > mainly because Sun was using the data as part of its business
> > metrics. It's not obvious that the same need exists in AOOo.
> 
> I think it is an essential tool to optimizing the web experience for
> our visitors.  It is part of a feedback loop where we look at the
> traffic stats, how our website is actually being used, the
> demographics of the visitors, etc., and then iteratively improve the
> website to make it more useful.

So first question is: analytics yes or no, which affects also the
Privacy Policy.

> On the question of Piwik (open source, used, for example by
> LibreOffice) versus Google Analytics,  I'm very familiar with Google,
> so I could help more there.  But I don't have an informed opinion on
> the virtues of each.  I've never heard of Piwik until today.

The big difference is that with Piwik the data collected stays inhouse
at Apache, whereas with Google it goes to Google that does whatever you
don't know. This again implies that at Apache measures must be taken to
protect the privacy of collected data. The German "Landeszentrum für
Datenschutz Schleswig-Holstein" (center of data protection) has a few
documents about tracking [1], unfortunately only in German, why Google
Analytics doesn't comply with the German data protection law [2] and how
Piwik can be configured to be used in compliance with the law [3].

[1] https://www.datenschutzzentrum.de/tracking/
[2] https://www.datenschutzzentrum.de/tracking/20090123_GA_stellungnahme.pdf
[3] https://www.datenschutzzentrum.de/tracking/piwik/

  Eike

-- 
 PGP/OpenPGP/GnuPG encrypted mail preferred in all private communication.
 Key ID: 0x293C05FD - 997A 4C60 CE41 0149 0DB3  9E96 2F1A D073 293C 05FD

Re: [WWW] Web analytics

Posted by Rob Weir <ap...@robweir.com>.
On Fri, Aug 12, 2011 at 12:35 PM, Simon Phipps <si...@webmink.com> wrote:
>
> On 12 Aug 2011, at 16:30, drew wrote:
>
>> Hi,
>>
>> Well, I thought I'd start this list - unless folks think it is too soon
>> of course.
>>
>> Currently the OO.o web sites, all of them, utilize a third party
>> analytics firm (not Google). This is done by requiring that each site
>> inject a small bit of JS in the page footer. (this was another
>> requirement for the forums as an example and is used also at the
>> extension/template repository)
>>
>> Going forward does the PPMC desire to use a third party analytic firm -
>> i.e. Google.
>>
>> or
>>
>> Do we prefer to do it all in-house. (just guessing that their may be
>> some knowledge on this subject in the overall Apache project :)
>
>
> Before taking that step, it's worth asking if the project actually has a need for web analytics yet. They were included on OO.o site mainly because Sun was using the data as part of its business metrics. It's not obvious that the same need exists in AOOo.
>

I think it is an essential tool to optimizing the web experience for
our visitors.  It is part of a feedback loop where we look at the
traffic stats, how our website is actually being used, the
demographics of the visitors, etc., and then iteratively improve the
website to make it more useful.

For example, if we noticed that a particular page was getting a large
number of visits from Italy, but was not translated into Italian, we
might prioritize getting that page translated.

Or, if we found that a given page was a top destination for Google
queries for "OpenOffice UNO" but we considered that another page had
more useful information, then we might optimize the keywords in the
other page to make it rate more highly in future searches.

Things like that.  We may not be a business at Apache, but we have
similar goals to increase visitor satisfaction and getting the most
value out of the website.   I think analytics is no-brainer.

On the question of Piwik (open source, used, for example by
LibreOffice) versus Google Analytics,  I'm very familiar with Google,
so I could help more there.  But I don't have an informed opinion on
the virtues of each.  I've never heard of Piwik until today.

> S.
>
>

Re: [WWW] Web analytics

Posted by drew <dr...@baseanswers.com>.
On Fri, 2011-08-12 at 17:35 +0100, Simon Phipps wrote:
> On 12 Aug 2011, at 16:30, drew wrote:
> 
> > Hi,
> > 
> > Well, I thought I'd start this list - unless folks think it is too soon
> > of course.
> > 
> > Currently the OO.o web sites, all of them, utilize a third party
> > analytics firm (not Google). This is done by requiring that each site
> > inject a small bit of JS in the page footer. (this was another
> > requirement for the forums as an example and is used also at the
> > extension/template repository)
> > 
> > Going forward does the PPMC desire to use a third party analytic firm -
> > i.e. Google.
> > 
> > or
> > 
> > Do we prefer to do it all in-house. (just guessing that their may be
> > some knowledge on this subject in the overall Apache project :)
> 
> 
> Before taking that step, it's worth asking if the project actually has a need for web analytics yet. They were included on OO.o site mainly because Sun was using the data as part of its business metrics. It's not obvious that the same need exists in AOOo.
> 

Howdy Simon,

I don't have a strong opinion one way or the other with regards to tool
set - but I do think analytics are as important for non-commercial as
commercial endeavors.

Anecdotally - personal experience on the forum (from a while back), and
using the logs not the Sun/Oracle gathered analytics.

We would track where people came from, how many pages did they view and
where did they go to and finally did they open a new topic (ask a
question).

For a support site the best scenario is that upon arrival they find an
answer within a short number of page views and without having to ask a
question.

At one point we where getting way fewer inbound from search engines then
expected - yup, yours truly had screwed up a setting and the bots
weren't getting to all the pages..fixed it, and the sites listing on the
SEs went up the page.

The more detailed information has lead to workflow changes at the site
and then tracked if those changes affected the outcomes in a positive
way. 

Do we care if more users of the site drive a Ford or a Chevy not at all,
but then again the overall project might care if more users are coming
in using Safari on an iPads vs FF on a netbook or IE on a
desktop..maybe.

Best wishes,

//drew



Re: [WWW] Web analytics

Posted by Dave Fisher <da...@comcast.net>.
On Aug 12, 2011, at 9:35 AM, Simon Phipps wrote:

> 
> On 12 Aug 2011, at 16:30, drew wrote:
> 
>> Hi,
>> 
>> Well, I thought I'd start this list - unless folks think it is too soon
>> of course.
>> 
>> Currently the OO.o web sites, all of them, utilize a third party
>> analytics firm (not Google). This is done by requiring that each site
>> inject a small bit of JS in the page footer. (this was another
>> requirement for the forums as an example and is used also at the
>> extension/template repository)
>> 
>> Going forward does the PPMC desire to use a third party analytic firm -
>> i.e. Google.
>> 
>> or
>> 
>> Do we prefer to do it all in-house. (just guessing that their may be
>> some knowledge on this subject in the overall Apache project :)
> 
> 
> Before taking that step, it's worth asking if the project actually has a need for web analytics yet. They were included on OO.o site mainly because Sun was using the data as part of its business metrics. It's not obvious that the same need exists in AOOo.

It is important to consider now because it does effect the Privacy Policy and Terms of Use. If we are doing analytics then we are collecting information, even if anonymized.

Regards,
Dave

> 
> S.
> 


Re: [WWW] Web analytics

Posted by Simon Phipps <si...@webmink.com>.
On 12 Aug 2011, at 16:30, drew wrote:

> Hi,
> 
> Well, I thought I'd start this list - unless folks think it is too soon
> of course.
> 
> Currently the OO.o web sites, all of them, utilize a third party
> analytics firm (not Google). This is done by requiring that each site
> inject a small bit of JS in the page footer. (this was another
> requirement for the forums as an example and is used also at the
> extension/template repository)
> 
> Going forward does the PPMC desire to use a third party analytic firm -
> i.e. Google.
> 
> or
> 
> Do we prefer to do it all in-house. (just guessing that their may be
> some knowledge on this subject in the overall Apache project :)


Before taking that step, it's worth asking if the project actually has a need for web analytics yet. They were included on OO.o site mainly because Sun was using the data as part of its business metrics. It's not obvious that the same need exists in AOOo.

S.