You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@devicemap.apache.org by Volkan YAZICI <vo...@gmail.com> on 2014/12/10 15:32:22 UTC

2x Performance Increase in classify()

Good news everyone!

Here is the patch that introduces JMH-based benchmarks for Java client:
DMAP-106 <https://issues.apache.org/jira/browse/DMAP-106>

And here is the patch that introduces >2x performance gain: DMAP-107
<https://issues.apache.org/jira/browse/DMAP-107>

*Sample output:*

$ export userAgentFile=/path/to/user-agents.txt
$ wc -l $userAgentFile
195325
$ java \
    -jar devicemap/java/classifier-benchmark/target/devicemap-client-benchmark.jar
\
    -jvmArgsAppend "-server -XX:+TieredCompilation -XX:+AggressiveOpts
-Xms1024m -Xmx4096m -DuserAgentFile=$userAgentFile" \
    -wi 5 -i 5 -bm avgt -tu ms -f 3 \
    ".*DeviceMapClientBenchmark.*"

# Using the most recent trunk.
Result: 12079.408 ±(99.9%) 1240.628 ms/op [Average]
  Statistics: (min, avg, max) = (11232.424, 12079.408, 16011.000),
stdev = 1160.484
  Confidence interval (99.9%): [10838.781, 13320.036]

# Using the enhanced classify().
Result: 5505.355 ±(99.9%) 441.748 ms/op [Average]
  Statistics: (min, avg, max) = (5060.269, 5505.355, 6508.699), stdev = 413.211
  Confidence interval (99.9%): [5063.607, 5947.103]


Cheers!

Re: 2x Performance Increase in classify()

Posted by Reza Naghibi <re...@yahoo.com.INVALID>.

Thanks, good to know.

So all that logic is going to be possible in when we split classification into the different aspects/domains. The added bonus is that all the logic is in the DDR, so the client code can remain static across DDR updates. So no parsing, no detection in code :)

      From: Werner Keil <we...@gmail.com>
 To: dev@devicemap.apache.org; Reza Naghibi <re...@yahoo.com> 
 Sent: Wednesday, December 10, 2014 2:00 PM
 Subject: Re: 2x Performance Increase in classify()
   
Just for Android take
https://svn.apache.org/repos/asf/devicemap/trunk/devicemap/java/simpleddr/src/main/java/org/apache/devicemap/simpleddr/builder/device/AndroidDeviceBuilder.java

I believe occasionally there are rudimentary regex patterns in the XML but
at least  for some of the more popular platforms these builders add power
the current "light" generic parser lacks.

Werner





On Wed, Dec 10, 2014 at 7:29 PM, Reza Naghibi <
reza.naghibi@yahoo.com.invalid> wrote:

> Can you show me these regex patterns? Are these patterns used for parsing
> or identification? Do they only exist in code or are they in the DDR?
>
>      From: Werner Keil <we...@gmail.com>
>  To: dev@devicemap.apache.org; Reza Naghibi <re...@yahoo.com>
>  Sent: Wednesday, December 10, 2014 1:23 PM
>  Subject: Re: 2x Performance Increase in classify()
>
> It does not parse the user agent it only uses more sophisticated (and see
> Android, etc. tailor made) regex patterns than the current large XML parser
> does;-)
>
>
>
>
>
>
> On Wed, Dec 10, 2014 at 7:15 PM, Reza Naghibi
> <re...@yahoo.com.invalid> wrote:
>
> If you are saying that the OpenDDR client parses the user agent string,
> then that is something we need to avoid at all costs. I honestly was not
> aware that OpenDDR did parsing like that. Parsing the user agent has a
> whole lot of problems associated with it. The best approach, and the
> approach the current client uses, is to use pattern matching on device,
> browser, and OS signatures and use that to target specific devices,
> browsers, operating systems, and their versions.
>
>      From: Werner Keil <we...@gmail.com>
>  To: dev@devicemap.apache.org; Reza Naghibi <re...@yahoo.com>
>  Sent: Wednesday, December 10, 2014 12:41 PM
>  Subject: Re: 2x Performance Increase in classify()
>
> Well, it's not "legacy" it's simply the W3C compliant version, while the
> new one deviates from that.
>
> It won't recognize the OS neither on the Samsung Galaxy 10.1 N upgraded to
> Android 4.1 or 4.2 now, still says 4.0.4 (which is wrong but seems to
> differ from the XML file, so the classifier tries "something" but not
> exactly the right thing)
> nor Android 5 on the Nexus 7. There it bluntly returns what's in the XML
> data file, "4.1" instead of the correct 5 also matching the UA.
>
> As the W3C client isn't on the VM it is not so easy to test it against
> actual tablets, but providing an actual UA like those from these tablets by
> hand should work.
>
> For Nexus especially there seems to be a bug in the data files. Someone
> invented "genericGoogle" which is a lose end, neither the W3C client nor
> the new parser would find something as the parent doesn't seem to be in any
> of the files ;-O
>
>
>
>
> On Wed, Dec 10, 2014 at 6:23 PM, Reza Naghibi <
> reza.naghibi@yahoo.com.invalid> wrote:
>
> > >> currently provide better recognition of say an update to Android 4 or
> 5
> >
> > Hmm... can you explain this in more detail?
> >
> > From my work on the legacy client, it does not do anything more than
> > matching builder strings against user agents. The legacy client had a
> more
> > brute force algorithm which would have to pick a particular builder to
> use,
> > which was error prone. The new classifier client attempts to match all
> > builders at once and then chooses the highest ranking match, thus
> > increasing the accuracy. So I am not aware of any reason that one client
> > can recognize a pattern better than the other, especially if they are
> > working off the of the same data. Only the opposite is possible, missing
> a
> > pattern match.
> >
> >      From: Werner Keil <we...@gmail.com>
> >  To: dev@devicemap.apache.org; Reza Naghibi <re...@yahoo.com>
> >  Sent: Wednesday, December 10, 2014 12:13 PM
> >  Subject: Re: 2x Performance Increase in classify()
> >
> > Volkan/Reza,
> >
> > Let's keep in mind, the W3C DDR implementation has specialized
> recognition
> > classes like OrderedTokenDeviceBuilder or TwoStepDeviceBuilder and
> > subclasses that analyze the UserAgent more thoroughly, and currently
> > provide better recognition of say an update to Android 4 or 5.
> >
> > Werner
> >
> >
> >
> >
> > On Wed, Dec 10, 2014 at 5:43 PM, Reza Naghibi <
> > reza.naghibi@yahoo.com.invalid> wrote:
> >
> > > Volkan,
> > >
> > > Thanks for the performance patch. I reviewed it and it looks pretty
> good.
> > > Pre patch, we were running each ngram set thru some raw string
> processing
> > > normalizations. You patch does a good job moving that to the beginning
> > and
> > > optimizing the regex. Good job :)
> > >
> > > As for pattern matching, if you look at the normalization method, we
> only
> > > look at alpha-numerics. This was done for simplicity sake. The downside
> > > here is that we weaken any pattern which contains non alpha numerics.
> > There
> > > are several ways to address and fix this, but since DeviceMap has
> control
> > > over its own data, I prefer fixing the patterns and keeping the
> matching
> > > engine simple. The thing to remember is that our data came from OpenDDR
> > > which had a more complex classification algorithm and heuristics, so we
> > > kind of have a bit of legacy baggage to sort thru as this project
> > evolves.
> > >
> > > Regarding our next release, I already have the Java client 1.1.0 ready
> to
> > > go. I would like to get your patch in on the next release, 1.1.1.
> > >
> > > Reza
> > >
> > >
> > >      From: Volkan YAZICI <vo...@gmail.com>
> > >  To: "devicemap-dev@incubator.apache.org" <
> > > devicemap-dev@incubator.apache.org>
> > >  Sent: Wednesday, December 10, 2014 9:32 AM
> > >  Subject: 2x Performance Increase in classify()
> > >
> > > Good news everyone!
> > >
> > > Here is the patch that introduces JMH-based benchmarks for Java client:
> > > DMAP-106 <https://issues.apache.org/jira/browse/DMAP-106>
> > >
> > > And here is the patch that introduces >2x performance gain: DMAP-107
> > > <https://issues.apache.org/jira/browse/DMAP-107>
> > >
> > > *Sample output:*
> > >
> > > $ export userAgentFile=/path/to/user-agents.txt
> > > $ wc -l $userAgentFile
> > > 195325
> > > $ java \
> > >    -jar
> > >
> devicemap/java/classifier-benchmark/target/devicemap-client-benchmark.jar
> > > \
> > >    -jvmArgsAppend "-server -XX:+TieredCompilation -XX:+AggressiveOpts
> > > -Xms1024m -Xmx4096m -DuserAgentFile=$userAgentFile" \
> > >    -wi 5 -i 5 -bm avgt -tu ms -f 3 \
> > >    ".*DeviceMapClientBenchmark.*"
> > >
> > > # Using the most recent trunk.
> > > Result: 12079.408 ±(99.9%) 1240.628 ms/op [Average]
> > >  Statistics: (min, avg, max) = (11232.424, 12079.408, 16011.000),
> > > stdev = 1160.484
> > >  Confidence interval (99.9%): [10838.781, 13320.036]
> > >
> > > # Using the enhanced classify().
> > > Result: 5505.355 ±(99.9%) 441.748 ms/op [Average]
> > >  Statistics: (min, avg, max) = (5060.269, 5505.355, 6508.699), stdev =
> > > 413.211
> > >  Confidence interval (99.9%): [5063.607, 5947.103]
> > >
> > >
> > > Cheers!
> > >
> > >
> > >
> >
> >
> >
>
>
>
>
>
>
>

Re: 2x Performance Increase in classify()

Posted by Werner Keil <we...@gmail.com>.

Just for Android take
https://svn.apache.org/repos/asf/devicemap/trunk/devicemap/java/simpleddr/src/main/java/org/apache/devicemap/simpleddr/builder/device/AndroidDeviceBuilder.java

I believe occasionally there are rudimentary regex patterns in the XML but
at least  for some of the more popular platforms these builders add power
the current "light" generic parser lacks.

Werner



On Wed, Dec 10, 2014 at 7:29 PM, Reza Naghibi <
reza.naghibi@yahoo.com.invalid> wrote:

> Can you show me these regex patterns? Are these patterns used for parsing
> or identification? Do they only exist in code or are they in the DDR?
>
>       From: Werner Keil <we...@gmail.com>
>  To: dev@devicemap.apache.org; Reza Naghibi <re...@yahoo.com>
>  Sent: Wednesday, December 10, 2014 1:23 PM
>  Subject: Re: 2x Performance Increase in classify()
>
> It does not parse the user agent it only uses more sophisticated (and see
> Android, etc. tailor made) regex patterns than the current large XML parser
> does;-)
>
>
>
>
>
>
> On Wed, Dec 10, 2014 at 7:15 PM, Reza Naghibi
> <re...@yahoo.com.invalid> wrote:
>
> If you are saying that the OpenDDR client parses the user agent string,
> then that is something we need to avoid at all costs. I honestly was not
> aware that OpenDDR did parsing like that. Parsing the user agent has a
> whole lot of problems associated with it. The best approach, and the
> approach the current client uses, is to use pattern matching on device,
> browser, and OS signatures and use that to target specific devices,
> browsers, operating systems, and their versions.
>
>       From: Werner Keil <we...@gmail.com>
>  To: dev@devicemap.apache.org; Reza Naghibi <re...@yahoo.com>
>  Sent: Wednesday, December 10, 2014 12:41 PM
>  Subject: Re: 2x Performance Increase in classify()
>
> Well, it's not "legacy" it's simply the W3C compliant version, while the
> new one deviates from that.
>
> It won't recognize the OS neither on the Samsung Galaxy 10.1 N upgraded to
> Android 4.1 or 4.2 now, still says 4.0.4 (which is wrong but seems to
> differ from the XML file, so the classifier tries "something" but not
> exactly the right thing)
> nor Android 5 on the Nexus 7. There it bluntly returns what's in the XML
> data file, "4.1" instead of the correct 5 also matching the UA.
>
> As the W3C client isn't on the VM it is not so easy to test it against
> actual tablets, but providing an actual UA like those from these tablets by
> hand should work.
>
> For Nexus especially there seems to be a bug in the data files. Someone
> invented "genericGoogle" which is a lose end, neither the W3C client nor
> the new parser would find something as the parent doesn't seem to be in any
> of the files ;-O
>
>
>
>
> On Wed, Dec 10, 2014 at 6:23 PM, Reza Naghibi <
> reza.naghibi@yahoo.com.invalid> wrote:
>
> > >> currently provide better recognition of say an update to Android 4 or
> 5
> >
> > Hmm... can you explain this in more detail?
> >
> > From my work on the legacy client, it does not do anything more than
> > matching builder strings against user agents. The legacy client had a
> more
> > brute force algorithm which would have to pick a particular builder to
> use,
> > which was error prone. The new classifier client attempts to match all
> > builders at once and then chooses the highest ranking match, thus
> > increasing the accuracy. So I am not aware of any reason that one client
> > can recognize a pattern better than the other, especially if they are
> > working off the of the same data. Only the opposite is possible, missing
> a
> > pattern match.
> >
> >      From: Werner Keil <we...@gmail.com>
> >  To: dev@devicemap.apache.org; Reza Naghibi <re...@yahoo.com>
> >  Sent: Wednesday, December 10, 2014 12:13 PM
> >  Subject: Re: 2x Performance Increase in classify()
> >
> > Volkan/Reza,
> >
> > Let's keep in mind, the W3C DDR implementation has specialized
> recognition
> > classes like OrderedTokenDeviceBuilder or TwoStepDeviceBuilder and
> > subclasses that analyze the UserAgent more thoroughly, and currently
> > provide better recognition of say an update to Android 4 or 5.
> >
> > Werner
> >
> >
> >
> >
> > On Wed, Dec 10, 2014 at 5:43 PM, Reza Naghibi <
> > reza.naghibi@yahoo.com.invalid> wrote:
> >
> > > Volkan,
> > >
> > > Thanks for the performance patch. I reviewed it and it looks pretty
> good.
> > > Pre patch, we were running each ngram set thru some raw string
> processing
> > > normalizations. You patch does a good job moving that to the beginning
> > and
> > > optimizing the regex. Good job :)
> > >
> > > As for pattern matching, if you look at the normalization method, we
> only
> > > look at alpha-numerics. This was done for simplicity sake. The downside
> > > here is that we weaken any pattern which contains non alpha numerics.
> > There
> > > are several ways to address and fix this, but since DeviceMap has
> control
> > > over its own data, I prefer fixing the patterns and keeping the
> matching
> > > engine simple. The thing to remember is that our data came from OpenDDR
> > > which had a more complex classification algorithm and heuristics, so we
> > > kind of have a bit of legacy baggage to sort thru as this project
> > evolves.
> > >
> > > Regarding our next release, I already have the Java client 1.1.0 ready
> to
> > > go. I would like to get your patch in on the next release, 1.1.1.
> > >
> > > Reza
> > >
> > >
> > >      From: Volkan YAZICI <vo...@gmail.com>
> > >  To: "devicemap-dev@incubator.apache.org" <
> > > devicemap-dev@incubator.apache.org>
> > >  Sent: Wednesday, December 10, 2014 9:32 AM
> > >  Subject: 2x Performance Increase in classify()
> > >
> > > Good news everyone!
> > >
> > > Here is the patch that introduces JMH-based benchmarks for Java client:
> > > DMAP-106 <https://issues.apache.org/jira/browse/DMAP-106>
> > >
> > > And here is the patch that introduces >2x performance gain: DMAP-107
> > > <https://issues.apache.org/jira/browse/DMAP-107>
> > >
> > > *Sample output:*
> > >
> > > $ export userAgentFile=/path/to/user-agents.txt
> > > $ wc -l $userAgentFile
> > > 195325
> > > $ java \
> > >    -jar
> > >
> devicemap/java/classifier-benchmark/target/devicemap-client-benchmark.jar
> > > \
> > >    -jvmArgsAppend "-server -XX:+TieredCompilation -XX:+AggressiveOpts
> > > -Xms1024m -Xmx4096m -DuserAgentFile=$userAgentFile" \
> > >    -wi 5 -i 5 -bm avgt -tu ms -f 3 \
> > >    ".*DeviceMapClientBenchmark.*"
> > >
> > > # Using the most recent trunk.
> > > Result: 12079.408 ±(99.9%) 1240.628 ms/op [Average]
> > >  Statistics: (min, avg, max) = (11232.424, 12079.408, 16011.000),
> > > stdev = 1160.484
> > >  Confidence interval (99.9%): [10838.781, 13320.036]
> > >
> > > # Using the enhanced classify().
> > > Result: 5505.355 ±(99.9%) 441.748 ms/op [Average]
> > >  Statistics: (min, avg, max) = (5060.269, 5505.355, 6508.699), stdev =
> > > 413.211
> > >  Confidence interval (99.9%): [5063.607, 5947.103]
> > >
> > >
> > > Cheers!
> > >
> > >
> > >
> >
> >
> >
>
>
>
>
>
>
>

Re: 2x Performance Increase in classify()

Posted by Reza Naghibi <re...@yahoo.com.INVALID>.

Can you show me these regex patterns? Are these patterns used for parsing or identification? Do they only exist in code or are they in the DDR?

      From: Werner Keil <we...@gmail.com>
 To: dev@devicemap.apache.org; Reza Naghibi <re...@yahoo.com> 
 Sent: Wednesday, December 10, 2014 1:23 PM
 Subject: Re: 2x Performance Increase in classify()
   
It does not parse the user agent it only uses more sophisticated (and see Android, etc. tailor made) regex patterns than the current large XML parser does;-)






On Wed, Dec 10, 2014 at 7:15 PM, Reza Naghibi <re...@yahoo.com.invalid> wrote:

If you are saying that the OpenDDR client parses the user agent string, then that is something we need to avoid at all costs. I honestly was not aware that OpenDDR did parsing like that. Parsing the user agent has a whole lot of problems associated with it. The best approach, and the approach the current client uses, is to use pattern matching on device, browser, and OS signatures and use that to target specific devices, browsers, operating systems, and their versions.

      From: Werner Keil <we...@gmail.com>
 To: dev@devicemap.apache.org; Reza Naghibi <re...@yahoo.com>
 Sent: Wednesday, December 10, 2014 12:41 PM
 Subject: Re: 2x Performance Increase in classify()

Well, it's not "legacy" it's simply the W3C compliant version, while the
new one deviates from that.

It won't recognize the OS neither on the Samsung Galaxy 10.1 N upgraded to
Android 4.1 or 4.2 now, still says 4.0.4 (which is wrong but seems to
differ from the XML file, so the classifier tries "something" but not
exactly the right thing)
nor Android 5 on the Nexus 7. There it bluntly returns what's in the XML
data file, "4.1" instead of the correct 5 also matching the UA.

As the W3C client isn't on the VM it is not so easy to test it against
actual tablets, but providing an actual UA like those from these tablets by
hand should work.

For Nexus especially there seems to be a bug in the data files. Someone
invented "genericGoogle" which is a lose end, neither the W3C client nor
the new parser would find something as the parent doesn't seem to be in any
of the files ;-O




On Wed, Dec 10, 2014 at 6:23 PM, Reza Naghibi <
reza.naghibi@yahoo.com.invalid> wrote:

> >> currently provide better recognition of say an update to Android 4 or 5
>
> Hmm... can you explain this in more detail?
>
> From my work on the legacy client, it does not do anything more than
> matching builder strings against user agents. The legacy client had a more
> brute force algorithm which would have to pick a particular builder to use,
> which was error prone. The new classifier client attempts to match all
> builders at once and then chooses the highest ranking match, thus
> increasing the accuracy. So I am not aware of any reason that one client
> can recognize a pattern better than the other, especially if they are
> working off the of the same data. Only the opposite is possible, missing a
> pattern match.
>
>      From: Werner Keil <we...@gmail.com>
>  To: dev@devicemap.apache.org; Reza Naghibi <re...@yahoo.com>
>  Sent: Wednesday, December 10, 2014 12:13 PM
>  Subject: Re: 2x Performance Increase in classify()
>
> Volkan/Reza,
>
> Let's keep in mind, the W3C DDR implementation has specialized recognition
> classes like OrderedTokenDeviceBuilder or TwoStepDeviceBuilder and
> subclasses that analyze the UserAgent more thoroughly, and currently
> provide better recognition of say an update to Android 4 or 5.
>
> Werner
>
>
>
>
> On Wed, Dec 10, 2014 at 5:43 PM, Reza Naghibi <
> reza.naghibi@yahoo.com.invalid> wrote:
>
> > Volkan,
> >
> > Thanks for the performance patch. I reviewed it and it looks pretty good.
> > Pre patch, we were running each ngram set thru some raw string processing
> > normalizations. You patch does a good job moving that to the beginning
> and
> > optimizing the regex. Good job :)
> >
> > As for pattern matching, if you look at the normalization method, we only
> > look at alpha-numerics. This was done for simplicity sake. The downside
> > here is that we weaken any pattern which contains non alpha numerics.
> There
> > are several ways to address and fix this, but since DeviceMap has control
> > over its own data, I prefer fixing the patterns and keeping the matching
> > engine simple. The thing to remember is that our data came from OpenDDR
> > which had a more complex classification algorithm and heuristics, so we
> > kind of have a bit of legacy baggage to sort thru as this project
> evolves.
> >
> > Regarding our next release, I already have the Java client 1.1.0 ready to
> > go. I would like to get your patch in on the next release, 1.1.1.
> >
> > Reza
> >
> >
> >      From: Volkan YAZICI <vo...@gmail.com>
> >  To: "devicemap-dev@incubator.apache.org" <
> > devicemap-dev@incubator.apache.org>
> >  Sent: Wednesday, December 10, 2014 9:32 AM
> >  Subject: 2x Performance Increase in classify()
> >
> > Good news everyone!
> >
> > Here is the patch that introduces JMH-based benchmarks for Java client:
> > DMAP-106 <https://issues.apache.org/jira/browse/DMAP-106>
> >
> > And here is the patch that introduces >2x performance gain: DMAP-107
> > <https://issues.apache.org/jira/browse/DMAP-107>
> >
> > *Sample output:*
> >
> > $ export userAgentFile=/path/to/user-agents.txt
> > $ wc -l $userAgentFile
> > 195325
> > $ java \
> >    -jar
> > devicemap/java/classifier-benchmark/target/devicemap-client-benchmark.jar
> > \
> >    -jvmArgsAppend "-server -XX:+TieredCompilation -XX:+AggressiveOpts
> > -Xms1024m -Xmx4096m -DuserAgentFile=$userAgentFile" \
> >    -wi 5 -i 5 -bm avgt -tu ms -f 3 \
> >    ".*DeviceMapClientBenchmark.*"
> >
> > # Using the most recent trunk.
> > Result: 12079.408 ±(99.9%) 1240.628 ms/op [Average]
> >  Statistics: (min, avg, max) = (11232.424, 12079.408, 16011.000),
> > stdev = 1160.484
> >  Confidence interval (99.9%): [10838.781, 13320.036]
> >
> > # Using the enhanced classify().
> > Result: 5505.355 ±(99.9%) 441.748 ms/op [Average]
> >  Statistics: (min, avg, max) = (5060.269, 5505.355, 6508.699), stdev =
> > 413.211
> >  Confidence interval (99.9%): [5063.607, 5947.103]
> >
> >
> > Cheers!
> >
> >
> >
>
>
>

Re: 2x Performance Increase in classify()

Posted by Werner Keil <we...@gmail.com>.

It does not parse the user agent it only uses more sophisticated (and see
Android, etc. tailor made) regex patterns than the current large XML parser
does;-)





On Wed, Dec 10, 2014 at 7:15 PM, Reza Naghibi <
reza.naghibi@yahoo.com.invalid> wrote:

> If you are saying that the OpenDDR client parses the user agent string,
> then that is something we need to avoid at all costs. I honestly was not
> aware that OpenDDR did parsing like that. Parsing the user agent has a
> whole lot of problems associated with it. The best approach, and the
> approach the current client uses, is to use pattern matching on device,
> browser, and OS signatures and use that to target specific devices,
> browsers, operating systems, and their versions.
>
>       From: Werner Keil <we...@gmail.com>
>  To: dev@devicemap.apache.org; Reza Naghibi <re...@yahoo.com>
>  Sent: Wednesday, December 10, 2014 12:41 PM
>  Subject: Re: 2x Performance Increase in classify()
>
> Well, it's not "legacy" it's simply the W3C compliant version, while the
> new one deviates from that.
>
> It won't recognize the OS neither on the Samsung Galaxy 10.1 N upgraded to
> Android 4.1 or 4.2 now, still says 4.0.4 (which is wrong but seems to
> differ from the XML file, so the classifier tries "something" but not
> exactly the right thing)
> nor Android 5 on the Nexus 7. There it bluntly returns what's in the XML
> data file, "4.1" instead of the correct 5 also matching the UA.
>
> As the W3C client isn't on the VM it is not so easy to test it against
> actual tablets, but providing an actual UA like those from these tablets by
> hand should work.
>
> For Nexus especially there seems to be a bug in the data files. Someone
> invented "genericGoogle" which is a lose end, neither the W3C client nor
> the new parser would find something as the parent doesn't seem to be in any
> of the files ;-O
>
>
>
>
> On Wed, Dec 10, 2014 at 6:23 PM, Reza Naghibi <
> reza.naghibi@yahoo.com.invalid> wrote:
>
> > >> currently provide better recognition of say an update to Android 4 or
> 5
> >
> > Hmm... can you explain this in more detail?
> >
> > From my work on the legacy client, it does not do anything more than
> > matching builder strings against user agents. The legacy client had a
> more
> > brute force algorithm which would have to pick a particular builder to
> use,
> > which was error prone. The new classifier client attempts to match all
> > builders at once and then chooses the highest ranking match, thus
> > increasing the accuracy. So I am not aware of any reason that one client
> > can recognize a pattern better than the other, especially if they are
> > working off the of the same data. Only the opposite is possible, missing
> a
> > pattern match.
> >
> >      From: Werner Keil <we...@gmail.com>
> >  To: dev@devicemap.apache.org; Reza Naghibi <re...@yahoo.com>
> >  Sent: Wednesday, December 10, 2014 12:13 PM
> >  Subject: Re: 2x Performance Increase in classify()
> >
> > Volkan/Reza,
> >
> > Let's keep in mind, the W3C DDR implementation has specialized
> recognition
> > classes like OrderedTokenDeviceBuilder or TwoStepDeviceBuilder and
> > subclasses that analyze the UserAgent more thoroughly, and currently
> > provide better recognition of say an update to Android 4 or 5.
> >
> > Werner
> >
> >
> >
> >
> > On Wed, Dec 10, 2014 at 5:43 PM, Reza Naghibi <
> > reza.naghibi@yahoo.com.invalid> wrote:
> >
> > > Volkan,
> > >
> > > Thanks for the performance patch. I reviewed it and it looks pretty
> good.
> > > Pre patch, we were running each ngram set thru some raw string
> processing
> > > normalizations. You patch does a good job moving that to the beginning
> > and
> > > optimizing the regex. Good job :)
> > >
> > > As for pattern matching, if you look at the normalization method, we
> only
> > > look at alpha-numerics. This was done for simplicity sake. The downside
> > > here is that we weaken any pattern which contains non alpha numerics.
> > There
> > > are several ways to address and fix this, but since DeviceMap has
> control
> > > over its own data, I prefer fixing the patterns and keeping the
> matching
> > > engine simple. The thing to remember is that our data came from OpenDDR
> > > which had a more complex classification algorithm and heuristics, so we
> > > kind of have a bit of legacy baggage to sort thru as this project
> > evolves.
> > >
> > > Regarding our next release, I already have the Java client 1.1.0 ready
> to
> > > go. I would like to get your patch in on the next release, 1.1.1.
> > >
> > > Reza
> > >
> > >
> > >      From: Volkan YAZICI <vo...@gmail.com>
> > >  To: "devicemap-dev@incubator.apache.org" <
> > > devicemap-dev@incubator.apache.org>
> > >  Sent: Wednesday, December 10, 2014 9:32 AM
> > >  Subject: 2x Performance Increase in classify()
> > >
> > > Good news everyone!
> > >
> > > Here is the patch that introduces JMH-based benchmarks for Java client:
> > > DMAP-106 <https://issues.apache.org/jira/browse/DMAP-106>
> > >
> > > And here is the patch that introduces >2x performance gain: DMAP-107
> > > <https://issues.apache.org/jira/browse/DMAP-107>
> > >
> > > *Sample output:*
> > >
> > > $ export userAgentFile=/path/to/user-agents.txt
> > > $ wc -l $userAgentFile
> > > 195325
> > > $ java \
> > >    -jar
> > >
> devicemap/java/classifier-benchmark/target/devicemap-client-benchmark.jar
> > > \
> > >    -jvmArgsAppend "-server -XX:+TieredCompilation -XX:+AggressiveOpts
> > > -Xms1024m -Xmx4096m -DuserAgentFile=$userAgentFile" \
> > >    -wi 5 -i 5 -bm avgt -tu ms -f 3 \
> > >    ".*DeviceMapClientBenchmark.*"
> > >
> > > # Using the most recent trunk.
> > > Result: 12079.408 ±(99.9%) 1240.628 ms/op [Average]
> > >  Statistics: (min, avg, max) = (11232.424, 12079.408, 16011.000),
> > > stdev = 1160.484
> > >  Confidence interval (99.9%): [10838.781, 13320.036]
> > >
> > > # Using the enhanced classify().
> > > Result: 5505.355 ±(99.9%) 441.748 ms/op [Average]
> > >  Statistics: (min, avg, max) = (5060.269, 5505.355, 6508.699), stdev =
> > > 413.211
> > >  Confidence interval (99.9%): [5063.607, 5947.103]
> > >
> > >
> > > Cheers!
> > >
> > >
> > >
> >
> >
> >
>
>
>

Re: 2x Performance Increase in classify()

Posted by Reza Naghibi <re...@yahoo.com.INVALID>.

If you are saying that the OpenDDR client parses the user agent string, then that is something we need to avoid at all costs. I honestly was not aware that OpenDDR did parsing like that. Parsing the user agent has a whole lot of problems associated with it. The best approach, and the approach the current client uses, is to use pattern matching on device, browser, and OS signatures and use that to target specific devices, browsers, operating systems, and their versions.

      From: Werner Keil <we...@gmail.com>
 To: dev@devicemap.apache.org; Reza Naghibi <re...@yahoo.com> 
 Sent: Wednesday, December 10, 2014 12:41 PM
 Subject: Re: 2x Performance Increase in classify()
   
Well, it's not "legacy" it's simply the W3C compliant version, while the
new one deviates from that.

It won't recognize the OS neither on the Samsung Galaxy 10.1 N upgraded to
Android 4.1 or 4.2 now, still says 4.0.4 (which is wrong but seems to
differ from the XML file, so the classifier tries "something" but not
exactly the right thing)
nor Android 5 on the Nexus 7. There it bluntly returns what's in the XML
data file, "4.1" instead of the correct 5 also matching the UA.

As the W3C client isn't on the VM it is not so easy to test it against
actual tablets, but providing an actual UA like those from these tablets by
hand should work.

For Nexus especially there seems to be a bug in the data files. Someone
invented "genericGoogle" which is a lose end, neither the W3C client nor
the new parser would find something as the parent doesn't seem to be in any
of the files ;-O




On Wed, Dec 10, 2014 at 6:23 PM, Reza Naghibi <
reza.naghibi@yahoo.com.invalid> wrote:

> >> currently provide better recognition of say an update to Android 4 or 5
>
> Hmm... can you explain this in more detail?
>
> From my work on the legacy client, it does not do anything more than
> matching builder strings against user agents. The legacy client had a more
> brute force algorithm which would have to pick a particular builder to use,
> which was error prone. The new classifier client attempts to match all
> builders at once and then chooses the highest ranking match, thus
> increasing the accuracy. So I am not aware of any reason that one client
> can recognize a pattern better than the other, especially if they are
> working off the of the same data. Only the opposite is possible, missing a
> pattern match.
>
>      From: Werner Keil <we...@gmail.com>
>  To: dev@devicemap.apache.org; Reza Naghibi <re...@yahoo.com>
>  Sent: Wednesday, December 10, 2014 12:13 PM
>  Subject: Re: 2x Performance Increase in classify()
>
> Volkan/Reza,
>
> Let's keep in mind, the W3C DDR implementation has specialized recognition
> classes like OrderedTokenDeviceBuilder or TwoStepDeviceBuilder and
> subclasses that analyze the UserAgent more thoroughly, and currently
> provide better recognition of say an update to Android 4 or 5.
>
> Werner
>
>
>
>
> On Wed, Dec 10, 2014 at 5:43 PM, Reza Naghibi <
> reza.naghibi@yahoo.com.invalid> wrote:
>
> > Volkan,
> >
> > Thanks for the performance patch. I reviewed it and it looks pretty good.
> > Pre patch, we were running each ngram set thru some raw string processing
> > normalizations. You patch does a good job moving that to the beginning
> and
> > optimizing the regex. Good job :)
> >
> > As for pattern matching, if you look at the normalization method, we only
> > look at alpha-numerics. This was done for simplicity sake. The downside
> > here is that we weaken any pattern which contains non alpha numerics.
> There
> > are several ways to address and fix this, but since DeviceMap has control
> > over its own data, I prefer fixing the patterns and keeping the matching
> > engine simple. The thing to remember is that our data came from OpenDDR
> > which had a more complex classification algorithm and heuristics, so we
> > kind of have a bit of legacy baggage to sort thru as this project
> evolves.
> >
> > Regarding our next release, I already have the Java client 1.1.0 ready to
> > go. I would like to get your patch in on the next release, 1.1.1.
> >
> > Reza
> >
> >
> >      From: Volkan YAZICI <vo...@gmail.com>
> >  To: "devicemap-dev@incubator.apache.org" <
> > devicemap-dev@incubator.apache.org>
> >  Sent: Wednesday, December 10, 2014 9:32 AM
> >  Subject: 2x Performance Increase in classify()
> >
> > Good news everyone!
> >
> > Here is the patch that introduces JMH-based benchmarks for Java client:
> > DMAP-106 <https://issues.apache.org/jira/browse/DMAP-106>
> >
> > And here is the patch that introduces >2x performance gain: DMAP-107
> > <https://issues.apache.org/jira/browse/DMAP-107>
> >
> > *Sample output:*
> >
> > $ export userAgentFile=/path/to/user-agents.txt
> > $ wc -l $userAgentFile
> > 195325
> > $ java \
> >    -jar
> > devicemap/java/classifier-benchmark/target/devicemap-client-benchmark.jar
> > \
> >    -jvmArgsAppend "-server -XX:+TieredCompilation -XX:+AggressiveOpts
> > -Xms1024m -Xmx4096m -DuserAgentFile=$userAgentFile" \
> >    -wi 5 -i 5 -bm avgt -tu ms -f 3 \
> >    ".*DeviceMapClientBenchmark.*"
> >
> > # Using the most recent trunk.
> > Result: 12079.408 ±(99.9%) 1240.628 ms/op [Average]
> >  Statistics: (min, avg, max) = (11232.424, 12079.408, 16011.000),
> > stdev = 1160.484
> >  Confidence interval (99.9%): [10838.781, 13320.036]
> >
> > # Using the enhanced classify().
> > Result: 5505.355 ±(99.9%) 441.748 ms/op [Average]
> >  Statistics: (min, avg, max) = (5060.269, 5505.355, 6508.699), stdev =
> > 413.211
> >  Confidence interval (99.9%): [5063.607, 5947.103]
> >
> >
> > Cheers!
> >
> >
> >
>
>
>

Re: 2x Performance Increase in classify()

Posted by Werner Keil <we...@gmail.com>.

The only thing where the new parser-based client could be flexible is to
say switch from "Android" to "Google" as long as the right data is
provided, otherwise (as we see) neither will recognize.

Specialized builders like AndroidBuilder in SimpleDDR contain additional
logic for smart UA recognition. Once you drilled down to a particular
device class like Android, etc. that helps and one may also add further
specialized builders to improve that. If you have one big parser that can
be faster or end up being more tedious.



On Wed, Dec 10, 2014 at 6:41 PM, Werner Keil <we...@gmail.com> wrote:

> Well, it's not "legacy" it's simply the W3C compliant version, while the
> new one deviates from that.
>
> It won't recognize the OS neither on the Samsung Galaxy 10.1 N upgraded to
> Android 4.1 or 4.2 now, still says 4.0.4 (which is wrong but seems to
> differ from the XML file, so the classifier tries "something" but not
> exactly the right thing)
> nor Android 5 on the Nexus 7. There it bluntly returns what's in the XML
> data file, "4.1" instead of the correct 5 also matching the UA.
>
> As the W3C client isn't on the VM it is not so easy to test it against
> actual tablets, but providing an actual UA like those from these tablets by
> hand should work.
>
> For Nexus especially there seems to be a bug in the data files. Someone
> invented "genericGoogle" which is a lose end, neither the W3C client nor
> the new parser would find something as the parent doesn't seem to be in any
> of the files ;-O
>
>
> On Wed, Dec 10, 2014 at 6:23 PM, Reza Naghibi <
> reza.naghibi@yahoo.com.invalid> wrote:
>
>> >> currently provide better recognition of say an update to Android 4 or 5
>>
>> Hmm... can you explain this in more detail?
>>
>> From my work on the legacy client, it does not do anything more than
>> matching builder strings against user agents. The legacy client had a more
>> brute force algorithm which would have to pick a particular builder to use,
>> which was error prone. The new classifier client attempts to match all
>> builders at once and then chooses the highest ranking match, thus
>> increasing the accuracy. So I am not aware of any reason that one client
>> can recognize a pattern better than the other, especially if they are
>> working off the of the same data. Only the opposite is possible, missing a
>> pattern match.
>>
>>       From: Werner Keil <we...@gmail.com>
>>  To: dev@devicemap.apache.org; Reza Naghibi <re...@yahoo.com>
>>  Sent: Wednesday, December 10, 2014 12:13 PM
>>  Subject: Re: 2x Performance Increase in classify()
>>
>> Volkan/Reza,
>>
>> Let's keep in mind, the W3C DDR implementation has specialized recognition
>> classes like OrderedTokenDeviceBuilder or TwoStepDeviceBuilder and
>> subclasses that analyze the UserAgent more thoroughly, and currently
>> provide better recognition of say an update to Android 4 or 5.
>>
>> Werner
>>
>>
>>
>>
>> On Wed, Dec 10, 2014 at 5:43 PM, Reza Naghibi <
>> reza.naghibi@yahoo.com.invalid> wrote:
>>
>> > Volkan,
>> >
>> > Thanks for the performance patch. I reviewed it and it looks pretty
>> good.
>> > Pre patch, we were running each ngram set thru some raw string
>> processing
>> > normalizations. You patch does a good job moving that to the beginning
>> and
>> > optimizing the regex. Good job :)
>> >
>> > As for pattern matching, if you look at the normalization method, we
>> only
>> > look at alpha-numerics. This was done for simplicity sake. The downside
>> > here is that we weaken any pattern which contains non alpha numerics.
>> There
>> > are several ways to address and fix this, but since DeviceMap has
>> control
>> > over its own data, I prefer fixing the patterns and keeping the matching
>> > engine simple. The thing to remember is that our data came from OpenDDR
>> > which had a more complex classification algorithm and heuristics, so we
>> > kind of have a bit of legacy baggage to sort thru as this project
>> evolves.
>> >
>> > Regarding our next release, I already have the Java client 1.1.0 ready
>> to
>> > go. I would like to get your patch in on the next release, 1.1.1.
>> >
>> > Reza
>> >
>> >
>> >      From: Volkan YAZICI <vo...@gmail.com>
>> >  To: "devicemap-dev@incubator.apache.org" <
>> > devicemap-dev@incubator.apache.org>
>> >  Sent: Wednesday, December 10, 2014 9:32 AM
>> >  Subject: 2x Performance Increase in classify()
>> >
>> > Good news everyone!
>> >
>> > Here is the patch that introduces JMH-based benchmarks for Java client:
>> > DMAP-106 <https://issues.apache.org/jira/browse/DMAP-106>
>> >
>> > And here is the patch that introduces >2x performance gain: DMAP-107
>> > <https://issues.apache.org/jira/browse/DMAP-107>
>> >
>> > *Sample output:*
>> >
>> > $ export userAgentFile=/path/to/user-agents.txt
>> > $ wc -l $userAgentFile
>> > 195325
>> > $ java \
>> >    -jar
>> >
>> devicemap/java/classifier-benchmark/target/devicemap-client-benchmark.jar
>> > \
>> >    -jvmArgsAppend "-server -XX:+TieredCompilation -XX:+AggressiveOpts
>> > -Xms1024m -Xmx4096m -DuserAgentFile=$userAgentFile" \
>> >    -wi 5 -i 5 -bm avgt -tu ms -f 3 \
>> >    ".*DeviceMapClientBenchmark.*"
>> >
>> > # Using the most recent trunk.
>> > Result: 12079.408 ±(99.9%) 1240.628 ms/op [Average]
>> >  Statistics: (min, avg, max) = (11232.424, 12079.408, 16011.000),
>> > stdev = 1160.484
>> >  Confidence interval (99.9%): [10838.781, 13320.036]
>> >
>> > # Using the enhanced classify().
>> > Result: 5505.355 ±(99.9%) 441.748 ms/op [Average]
>> >  Statistics: (min, avg, max) = (5060.269, 5505.355, 6508.699), stdev =
>> > 413.211
>> >  Confidence interval (99.9%): [5063.607, 5947.103]
>> >
>> >
>> > Cheers!
>> >
>> >
>> >
>>
>>
>>
>
>

Re: 2x Performance Increase in classify()

Posted by Werner Keil <we...@gmail.com>.

Well, it's not "legacy" it's simply the W3C compliant version, while the
new one deviates from that.

It won't recognize the OS neither on the Samsung Galaxy 10.1 N upgraded to
Android 4.1 or 4.2 now, still says 4.0.4 (which is wrong but seems to
differ from the XML file, so the classifier tries "something" but not
exactly the right thing)
nor Android 5 on the Nexus 7. There it bluntly returns what's in the XML
data file, "4.1" instead of the correct 5 also matching the UA.

As the W3C client isn't on the VM it is not so easy to test it against
actual tablets, but providing an actual UA like those from these tablets by
hand should work.

For Nexus especially there seems to be a bug in the data files. Someone
invented "genericGoogle" which is a lose end, neither the W3C client nor
the new parser would find something as the parent doesn't seem to be in any
of the files ;-O


On Wed, Dec 10, 2014 at 6:23 PM, Reza Naghibi <
reza.naghibi@yahoo.com.invalid> wrote:

> >> currently provide better recognition of say an update to Android 4 or 5
>
> Hmm... can you explain this in more detail?
>
> From my work on the legacy client, it does not do anything more than
> matching builder strings against user agents. The legacy client had a more
> brute force algorithm which would have to pick a particular builder to use,
> which was error prone. The new classifier client attempts to match all
> builders at once and then chooses the highest ranking match, thus
> increasing the accuracy. So I am not aware of any reason that one client
> can recognize a pattern better than the other, especially if they are
> working off the of the same data. Only the opposite is possible, missing a
> pattern match.
>
>       From: Werner Keil <we...@gmail.com>
>  To: dev@devicemap.apache.org; Reza Naghibi <re...@yahoo.com>
>  Sent: Wednesday, December 10, 2014 12:13 PM
>  Subject: Re: 2x Performance Increase in classify()
>
> Volkan/Reza,
>
> Let's keep in mind, the W3C DDR implementation has specialized recognition
> classes like OrderedTokenDeviceBuilder or TwoStepDeviceBuilder and
> subclasses that analyze the UserAgent more thoroughly, and currently
> provide better recognition of say an update to Android 4 or 5.
>
> Werner
>
>
>
>
> On Wed, Dec 10, 2014 at 5:43 PM, Reza Naghibi <
> reza.naghibi@yahoo.com.invalid> wrote:
>
> > Volkan,
> >
> > Thanks for the performance patch. I reviewed it and it looks pretty good.
> > Pre patch, we were running each ngram set thru some raw string processing
> > normalizations. You patch does a good job moving that to the beginning
> and
> > optimizing the regex. Good job :)
> >
> > As for pattern matching, if you look at the normalization method, we only
> > look at alpha-numerics. This was done for simplicity sake. The downside
> > here is that we weaken any pattern which contains non alpha numerics.
> There
> > are several ways to address and fix this, but since DeviceMap has control
> > over its own data, I prefer fixing the patterns and keeping the matching
> > engine simple. The thing to remember is that our data came from OpenDDR
> > which had a more complex classification algorithm and heuristics, so we
> > kind of have a bit of legacy baggage to sort thru as this project
> evolves.
> >
> > Regarding our next release, I already have the Java client 1.1.0 ready to
> > go. I would like to get your patch in on the next release, 1.1.1.
> >
> > Reza
> >
> >
> >      From: Volkan YAZICI <vo...@gmail.com>
> >  To: "devicemap-dev@incubator.apache.org" <
> > devicemap-dev@incubator.apache.org>
> >  Sent: Wednesday, December 10, 2014 9:32 AM
> >  Subject: 2x Performance Increase in classify()
> >
> > Good news everyone!
> >
> > Here is the patch that introduces JMH-based benchmarks for Java client:
> > DMAP-106 <https://issues.apache.org/jira/browse/DMAP-106>
> >
> > And here is the patch that introduces >2x performance gain: DMAP-107
> > <https://issues.apache.org/jira/browse/DMAP-107>
> >
> > *Sample output:*
> >
> > $ export userAgentFile=/path/to/user-agents.txt
> > $ wc -l $userAgentFile
> > 195325
> > $ java \
> >    -jar
> > devicemap/java/classifier-benchmark/target/devicemap-client-benchmark.jar
> > \
> >    -jvmArgsAppend "-server -XX:+TieredCompilation -XX:+AggressiveOpts
> > -Xms1024m -Xmx4096m -DuserAgentFile=$userAgentFile" \
> >    -wi 5 -i 5 -bm avgt -tu ms -f 3 \
> >    ".*DeviceMapClientBenchmark.*"
> >
> > # Using the most recent trunk.
> > Result: 12079.408 ±(99.9%) 1240.628 ms/op [Average]
> >  Statistics: (min, avg, max) = (11232.424, 12079.408, 16011.000),
> > stdev = 1160.484
> >  Confidence interval (99.9%): [10838.781, 13320.036]
> >
> > # Using the enhanced classify().
> > Result: 5505.355 ±(99.9%) 441.748 ms/op [Average]
> >  Statistics: (min, avg, max) = (5060.269, 5505.355, 6508.699), stdev =
> > 413.211
> >  Confidence interval (99.9%): [5063.607, 5947.103]
> >
> >
> > Cheers!
> >
> >
> >
>
>
>

Re: 2x Performance Increase in classify()

Posted by Reza Naghibi <re...@yahoo.com.INVALID>.

>> currently provide better recognition of say an update to Android 4 or 5

Hmm... can you explain this in more detail?

>From my work on the legacy client, it does not do anything more than matching builder strings against user agents. The legacy client had a more brute force algorithm which would have to pick a particular builder to use, which was error prone. The new classifier client attempts to match all builders at once and then chooses the highest ranking match, thus increasing the accuracy. So I am not aware of any reason that one client can recognize a pattern better than the other, especially if they are working off the of the same data. Only the opposite is possible, missing a pattern match.

      From: Werner Keil <we...@gmail.com>
 To: dev@devicemap.apache.org; Reza Naghibi <re...@yahoo.com> 
 Sent: Wednesday, December 10, 2014 12:13 PM
 Subject: Re: 2x Performance Increase in classify()
   
Volkan/Reza,

Let's keep in mind, the W3C DDR implementation has specialized recognition
classes like OrderedTokenDeviceBuilder or TwoStepDeviceBuilder and
subclasses that analyze the UserAgent more thoroughly, and currently
provide better recognition of say an update to Android 4 or 5.

Werner




On Wed, Dec 10, 2014 at 5:43 PM, Reza Naghibi <
reza.naghibi@yahoo.com.invalid> wrote:

> Volkan,
>
> Thanks for the performance patch. I reviewed it and it looks pretty good.
> Pre patch, we were running each ngram set thru some raw string processing
> normalizations. You patch does a good job moving that to the beginning and
> optimizing the regex. Good job :)
>
> As for pattern matching, if you look at the normalization method, we only
> look at alpha-numerics. This was done for simplicity sake. The downside
> here is that we weaken any pattern which contains non alpha numerics. There
> are several ways to address and fix this, but since DeviceMap has control
> over its own data, I prefer fixing the patterns and keeping the matching
> engine simple. The thing to remember is that our data came from OpenDDR
> which had a more complex classification algorithm and heuristics, so we
> kind of have a bit of legacy baggage to sort thru as this project evolves.
>
> Regarding our next release, I already have the Java client 1.1.0 ready to
> go. I would like to get your patch in on the next release, 1.1.1.
>
> Reza
>
>
>      From: Volkan YAZICI <vo...@gmail.com>
>  To: "devicemap-dev@incubator.apache.org" <
> devicemap-dev@incubator.apache.org>
>  Sent: Wednesday, December 10, 2014 9:32 AM
>  Subject: 2x Performance Increase in classify()
>
> Good news everyone!
>
> Here is the patch that introduces JMH-based benchmarks for Java client:
> DMAP-106 <https://issues.apache.org/jira/browse/DMAP-106>
>
> And here is the patch that introduces >2x performance gain: DMAP-107
> <https://issues.apache.org/jira/browse/DMAP-107>
>
> *Sample output:*
>
> $ export userAgentFile=/path/to/user-agents.txt
> $ wc -l $userAgentFile
> 195325
> $ java \
>    -jar
> devicemap/java/classifier-benchmark/target/devicemap-client-benchmark.jar
> \
>    -jvmArgsAppend "-server -XX:+TieredCompilation -XX:+AggressiveOpts
> -Xms1024m -Xmx4096m -DuserAgentFile=$userAgentFile" \
>    -wi 5 -i 5 -bm avgt -tu ms -f 3 \
>    ".*DeviceMapClientBenchmark.*"
>
> # Using the most recent trunk.
> Result: 12079.408 ±(99.9%) 1240.628 ms/op [Average]
>  Statistics: (min, avg, max) = (11232.424, 12079.408, 16011.000),
> stdev = 1160.484
>  Confidence interval (99.9%): [10838.781, 13320.036]
>
> # Using the enhanced classify().
> Result: 5505.355 ±(99.9%) 441.748 ms/op [Average]
>  Statistics: (min, avg, max) = (5060.269, 5505.355, 6508.699), stdev =
> 413.211
>  Confidence interval (99.9%): [5063.607, 5947.103]
>
>
> Cheers!
>
>
>

Re: 2x Performance Increase in classify()

Posted by Werner Keil <we...@gmail.com>.

Volkan/Reza,

Let's keep in mind, the W3C DDR implementation has specialized recognition
classes like OrderedTokenDeviceBuilder or TwoStepDeviceBuilder and
subclasses that analyze the UserAgent more thoroughly, and currently
provide better recognition of say an update to Android 4 or 5.

Werner


On Wed, Dec 10, 2014 at 5:43 PM, Reza Naghibi <
reza.naghibi@yahoo.com.invalid> wrote:

> Volkan,
>
> Thanks for the performance patch. I reviewed it and it looks pretty good.
> Pre patch, we were running each ngram set thru some raw string processing
> normalizations. You patch does a good job moving that to the beginning and
> optimizing the regex. Good job :)
>
> As for pattern matching, if you look at the normalization method, we only
> look at alpha-numerics. This was done for simplicity sake. The downside
> here is that we weaken any pattern which contains non alpha numerics. There
> are several ways to address and fix this, but since DeviceMap has control
> over its own data, I prefer fixing the patterns and keeping the matching
> engine simple. The thing to remember is that our data came from OpenDDR
> which had a more complex classification algorithm and heuristics, so we
> kind of have a bit of legacy baggage to sort thru as this project evolves.
>
> Regarding our next release, I already have the Java client 1.1.0 ready to
> go. I would like to get your patch in on the next release, 1.1.1.
>
> Reza
>
>
>       From: Volkan YAZICI <vo...@gmail.com>
>  To: "devicemap-dev@incubator.apache.org" <
> devicemap-dev@incubator.apache.org>
>  Sent: Wednesday, December 10, 2014 9:32 AM
>  Subject: 2x Performance Increase in classify()
>
> Good news everyone!
>
> Here is the patch that introduces JMH-based benchmarks for Java client:
> DMAP-106 <https://issues.apache.org/jira/browse/DMAP-106>
>
> And here is the patch that introduces >2x performance gain: DMAP-107
> <https://issues.apache.org/jira/browse/DMAP-107>
>
> *Sample output:*
>
> $ export userAgentFile=/path/to/user-agents.txt
> $ wc -l $userAgentFile
> 195325
> $ java \
>     -jar
> devicemap/java/classifier-benchmark/target/devicemap-client-benchmark.jar
> \
>     -jvmArgsAppend "-server -XX:+TieredCompilation -XX:+AggressiveOpts
> -Xms1024m -Xmx4096m -DuserAgentFile=$userAgentFile" \
>     -wi 5 -i 5 -bm avgt -tu ms -f 3 \
>     ".*DeviceMapClientBenchmark.*"
>
> # Using the most recent trunk.
> Result: 12079.408 ±(99.9%) 1240.628 ms/op [Average]
>   Statistics: (min, avg, max) = (11232.424, 12079.408, 16011.000),
> stdev = 1160.484
>   Confidence interval (99.9%): [10838.781, 13320.036]
>
> # Using the enhanced classify().
> Result: 5505.355 ±(99.9%) 441.748 ms/op [Average]
>   Statistics: (min, avg, max) = (5060.269, 5505.355, 6508.699), stdev =
> 413.211
>   Confidence interval (99.9%): [5063.607, 5947.103]
>
>
> Cheers!
>
>
>

Re: 2x Performance Increase in classify()

Posted by Reza Naghibi <re...@yahoo.com.INVALID>.

Volkan,

Thanks for the performance patch. I reviewed it and it looks pretty good. Pre patch, we were running each ngram set thru some raw string processing normalizations. You patch does a good job moving that to the beginning and optimizing the regex. Good job :)

As for pattern matching, if you look at the normalization method, we only look at alpha-numerics. This was done for simplicity sake. The downside here is that we weaken any pattern which contains non alpha numerics. There are several ways to address and fix this, but since DeviceMap has control over its own data, I prefer fixing the patterns and keeping the matching engine simple. The thing to remember is that our data came from OpenDDR which had a more complex classification algorithm and heuristics, so we kind of have a bit of legacy baggage to sort thru as this project evolves.

Regarding our next release, I already have the Java client 1.1.0 ready to go. I would like to get your patch in on the next release, 1.1.1.

Reza


      From: Volkan YAZICI <vo...@gmail.com>
 To: "devicemap-dev@incubator.apache.org" <de...@incubator.apache.org> 
 Sent: Wednesday, December 10, 2014 9:32 AM
 Subject: 2x Performance Increase in classify()
   
Good news everyone!

Here is the patch that introduces JMH-based benchmarks for Java client:
DMAP-106 <https://issues.apache.org/jira/browse/DMAP-106>

And here is the patch that introduces >2x performance gain: DMAP-107
<https://issues.apache.org/jira/browse/DMAP-107>

*Sample output:*

$ export userAgentFile=/path/to/user-agents.txt
$ wc -l $userAgentFile
195325
$ java \
    -jar devicemap/java/classifier-benchmark/target/devicemap-client-benchmark.jar
\
    -jvmArgsAppend "-server -XX:+TieredCompilation -XX:+AggressiveOpts
-Xms1024m -Xmx4096m -DuserAgentFile=$userAgentFile" \
    -wi 5 -i 5 -bm avgt -tu ms -f 3 \
    ".*DeviceMapClientBenchmark.*"

# Using the most recent trunk.
Result: 12079.408 ±(99.9%) 1240.628 ms/op [Average]
  Statistics: (min, avg, max) = (11232.424, 12079.408, 16011.000),
stdev = 1160.484
  Confidence interval (99.9%): [10838.781, 13320.036]

# Using the enhanced classify().
Result: 5505.355 ±(99.9%) 441.748 ms/op [Average]
  Statistics: (min, avg, max) = (5060.269, 5505.355, 6508.699), stdev = 413.211
  Confidence interval (99.9%): [5063.607, 5947.103]


Cheers!