You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@devicemap.apache.org by "eberhard speer jr." <se...@ducis.net> on 2013/07/08 18:25:56 UTC

user-agents - urgent appeal

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Folks,

digging thru the results from the tests with the new DeviceMapClient I
noticed that the User-agent test-data is missing user-agent strings for
almost 10% of the devices in the 1.19 release !

(Full list here :
http://svn.apache.org/viewvc/incubator/devicemap/trunk/openddr/test-data/src/main/resources/test-data/missing_ua.txt?view=co
)

Obviously we need 'decent' test-data, i.e. : user-agent string at least
for the devices in the resources, in order to conduct meaningful tests.

So, I was thinking :

- - Stefano : for sure, in order to make the builder-data sources OpenDDR
must have samples of the User-agents for the specified devices. Would it
be possible to inquire whether they would be willing to share those ?
Maybe include them in future releases ?

- - I'm sure, particularly Adobe and The Weather Channel, as well as
others, must see bazillions of user-agents in their web-logs every
day. Would it be possible to ask your respective web-ops people to
make available a weekly or monthly list of just the user-agent strings ?

- - Bertrand : the ASF itself must also collect massive amounts of
user-agent strings in there logs. Is there someone, maybe in
Infrastructure, we can contact with our request ?

- - Maybe we can 'canvass' other Apache lists asking for lists of
user-agent strings ?

I have resources dedicated to sorting out the user-agent lists : weed
out duplicates, match to device etc, after which I will make them all
available, as before, in the test-data directory.

I will be updating the test-data with the user-agents I harvested
since the last contribution and I very much hope to get much more in
response to this urgent request.

Anyone who wants contribute user-agents string : just mail the lists
to : esjr@apache.org and I will ensure they appear matched to device
(if 'mobile') in the test-data repository.

Looking forward to an overwhelming response,

esjr
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (MingW32)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJR2ugUAAoJEOxywXcFLKYccHMIAKLviPUszNBV6CN9dW8PHKFr
jsGSRPwYa+Nqsfyves5Bn5mRAPWq/EwxJY+E03iHKpJJ91CWVJC02u/LDqyFvjmz
KCyX3KI/xa8JkRTNcOq6DgTadMuE2Btvm0SykNaBKzmOSC4fVfoMnQubpaHEsn+g
w5zxO+vAXwQoLU8DhAQCaRQJ5UGXF6H94CaI06cj7GG5+859cdPtO56OTUS5YVS2
AANMJLwImyHavLUAMjn1U2tKMj5o25tfB8rHsELO2QZGEP4hBTyK3D6tLN9dHMra
Gq+RlB63XYZLlXp8KZakJdl7PSX+CowY1/0wdg74wOeCQJzFAqeT7Lv0FRpUjls=
=bybg
-----END PGP SIGNATURE-----

Re: user-agents - urgent appeal

Posted by Bertrand Delacretaz <bd...@apache.org>.
Hi,

On Mon, Jul 8, 2013 at 6:25 PM, eberhard speer jr. <se...@ducis.net> wrote:
> ...- - I'm sure, particularly Adobe and The Weather Channel, as well as
> others, must see bazillions of user-agents in their web-logs every
> day. Would it be possible to ask your respective web-ops people to
> make available a weekly or monthly list of just the user-agent strings ?...

I can try for Adobe, but see my next comments, they might also apply
in this case.

> ...- - Bertrand : the ASF itself must also collect massive amounts of
> user-agent strings in there logs. Is there someone, maybe in
> Infrastructure, we can contact with our request ?...

I asked a while ago, it was just an informal conversation but two
obstacles were mentioned:

a) In some countries (Germany IIRC), User-Agent is considered private
information, so publishing it without the owner's consent would be
problematic.

b) For some machine interactions, (svn clients IIRC) the User-Agent
contains data that does disclose more information than desired if you
were to publish it openly.

So, blindly grabbing all user-agent values from apache.org websites is
probably not possible unless we can come up with a process that allows
us to filter for "public" user-agents and ignore others. I imagine
this project's PMC members could be trusted with the full apache.org
logs, if we have a reliable way to filter them.

-Bertrand