You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@devicemap.apache.org by Apache Wiki <wi...@apache.org> on 2013/11/27 09:28:12 UTC

[Devicemap Wiki] Update of "esjr/Test Data" by esjr

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Devicemap Wiki" for change notification.

The "esjr/Test Data" page has been changed by esjr:
https://wiki.apache.org/devicemap/esjr/Test%20Data

New page:
##master-page:HomepageReadWritePageTemplate
##master-date:Unknown-Date
#format wiki
#language en
= UserAgent Test Data =
This document describes the test data files used in DeviceMap tests.
<<BR>>
''(todo : add svn link once upload finishes)''

== UserAgentString.txt ==
Columns :
 * UserAgentString : nvarchar(1500)

Currently contains 918,709 unique user agent strings.<<BR>>
The majority was collected from web access logs from live web servers.<<BR>>
102,121 of these were identified as belonging to mobile or other devices.

== UserAgentDetail.txt ==
Pipe-separated text file.<<BR>>
Columns :
 * StringHash : varbinary(32) : hashbytes('SHA2_256', UserAgentString)
 * TypeId : int
 * Flag : int

Because there is no separator character imaginable that can be useful to separate columns, the actual user agent string is split from it's properties in UserAgentDetail.txt.<<BR>>
The user agent string is linked to it's detail record via its SHA-2 256 hash. (In an RDBMS, like MS SQL, adding this field as persistent computed columns speeds things up '''considerably'''.)<<BR>>
The TypeId field is the PK or Id of the Types listed in UserAgentType.txt.<<BR>>
The Flag field is used to mark user agent strings so that the same set can be used in different tests (see below).

== UserAgentType.txt ==
Pipe-separated text file.<<BR>>
Columns :
 * Id int
 * Type nvarchar(50)

UserAgentType list 76 types of user agent strings (some of which are debatable).

== UserAgentDevice.txt ==
Pipe-separated text file.<<BR>>
Columns :
 * StringHash : UserAgent SHA-256 hash
 * OpenDdr : OpenDdr device Id found via OpenDdr code
 * DeviceMap : OpenDdr device Id found via DeviceMapClient code
 * Flag used to separate data sets for testing

== Testing ==
For tests the data is best loaded in an RDBMS.<<BR>>
This is the general procedure I use :<<BR>>

 1. Create instance of client/parser class
 2. GetDataSet : SELECT PK and UserAgentString : random, based on type or flagged dataset
 3. 'cold' run using 3 pre-selected user agent strings
 4. For each UserAgentString in DataSet
      * Start Timer
      * Map/Resolve
      * Stop Timer
      * INSERT PK, TimeTaken and DeviceId (or 'unknown') in ResultLog
 5. Rinse and repeat...