You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2004/11/21 00:27:07 UTC

TIP: very useful '%seen' trick

this just came up on perl5-porters...
http://www.nntp.perl.org/group/perl.perl5.porters/96100 :

  Subject: Re: sharing hash-values
  From: btilly[at]gmail.com (Ben Tilly)
  ...
  I forgot who I first saw mention this, possibly gbarr, but the following
  variation on %seen seems to be the fastest in native Perl:

    my %seen;
    undef @seen{@special};
    for (@things) {
      if (exists $seen{$_}) {
        ...
      }
    }

  This avoids creating the hash values entirely.  (Or at least it did a few
  revs of Perl ago.)
  Cheers,
  Ben

sure enough, using the shared "undef" SV as the magic value is 7% faster and
doesn't allocate the scalars to reduce RAM usage ;)   definitely the better
idiom.  Benchmark:

: jm 1122...; perl psc
                Rate traditional  undef_keys
traditional 100014/s          --         -6%
undef_keys  106684/s          7%          --


script:

#!/usr/bin/perl -w

use Benchmark qw(:all);
use strict;

my @things = qw(
        foo bar baz foo foo foo bar bar baz baz blarg
    );

cmpthese (-2, {
    'traditional' => sub {
        my $res = '';
        my %seen;
        for (@things) {
          next if $seen{$_};
          $seen{$_} = 1;
          $res .= "$_\n";
        }
    },
    'undef_keys' => sub {
        my $res = '';
        my %seen;
        # undef @seen{@special};
        for (@things) {
          next if exists $seen{$_};
          undef $seen{$_};
          $res .= "$_\n";
        }
    }
  });


(ps: note the 'undef @seen{@special};' -- can be used to undef a list of
already-seen "special" values before the loop.)

--j.