You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucenenet.apache.org by "Van Den Berghe, Vincent" <Vi...@bvdinfo.com> on 2017/03/23 14:56:49 UTC

TestAgainstBrzozowski weirdness

Here's a fun fact about TestAgainstBrzozowski.
This is the original test which fails every time:

                Automaton a = AutomatonTestUtil.RandomAutomaton(Random());
                AutomatonTestUtil.MinimizeSimple(a);
                Automaton b = (Automaton)a.Clone();
                MinimizationOperations.Minimize(b);
                Assert.IsTrue(BasicOperations.SameLanguage(a, b));
                Assert.AreEqual(a.GetNumberOfStates(), b.GetNumberOfStates());
                Assert.AreEqual(a.GetNumberOfTransitions(), b.GetNumberOfTransitions());

If we change the call to AutomatonTestUtil.MinimizeSimple(a) by MinimizationOperations.Minimize(a), the test succeeds:

                Automaton a = AutomatonTestUtil.RandomAutomaton(Random());
                MinimizationOperations.Minimize(a);
                Automaton b = (Automaton)a.Clone();
                MinimizationOperations.Minimize(b);
                Assert.IsTrue(BasicOperations.SameLanguage(a, b));
                Assert.AreEqual(a.GetNumberOfStates(), b.GetNumberOfStates());
                Assert.AreEqual(a.GetNumberOfTransitions(), b.GetNumberOfTransitions());

Before you say "big deal", if we replace the call to MinimizationOperations.Minimize(b) by AutomatonTestUtil.MinimizeSimple(b), the test fails every time:

                Automaton a = AutomatonTestUtil.RandomAutomaton(Random());
                AutomatonTestUtil.MinimizeSimple(a);
                Automaton b = (Automaton)a.Clone();
                AutomatonTestUtil.MinimizeSimple(b);
                Assert.IsTrue(BasicOperations.SameLanguage(a, b));
                Assert.AreEqual(a.GetNumberOfStates(), b.GetNumberOfStates());
                Assert.AreEqual(a.GetNumberOfTransitions(), b.GetNumberOfTransitions());

Not only that, but if we re-order the clone operation to be executed on the un-minimized automaton (to make sure we really have 2 identical automatons) like this:

                Automaton a = AutomatonTestUtil.RandomAutomaton(Random());
                Automaton b = (Automaton)a.Clone();
                AutomatonTestUtil.MinimizeSimple(a);
                AutomatonTestUtil.MinimizeSimple(b);
                Assert.IsTrue(BasicOperations.SameLanguage(a, b));
                Assert.AreEqual(a.GetNumberOfStates(), b.GetNumberOfStates());
                Assert.AreEqual(a.GetNumberOfTransitions(), b.GetNumberOfTransitions());

... the test fails as well. This is contrary to what is claimed in the big giant comment.
To make sure that cloning works, this:

                Automaton a = AutomatonTestUtil.RandomAutomaton(Random());
                Automaton b = (Automaton)a.Clone();
                Assert.IsTrue(BasicOperations.SameLanguage(a, b));
                Assert.AreEqual(a.GetNumberOfStates(), b.GetNumberOfStates());
                Assert.AreEqual(a.GetNumberOfTransitions(), b.GetNumberOfTransitions());
                AutomatonTestUtil.MinimizeSimple(a);
                AutomatonTestUtil.MinimizeSimple(b);
                Assert.IsTrue(BasicOperations.SameLanguage(a, b));
                Assert.AreEqual(a.GetNumberOfStates(), b.GetNumberOfStates());
                Assert.AreEqual(a.GetNumberOfTransitions(), b.GetNumberOfTransitions());

... never fails in the first 3 assertions. Always fails in one of the last ones.

Preliminary conclusion:  if AutomatonTestUtil.MinimizeSimple can't give the same results for identical automatons, it's not a good basis for a test.
As to why... I searched for random factors in the AutomatonTestUtil.MinimizeSimple, but found none.  It looks like the algorithm is deterministic.
The only strange thing is the GetHashCode() implementation on State: it's bad form to define a hash code without an Equals method, especially if you're putting States in hash sets or dictionaries. But it seems like reference comparison is all that's needed, and since the hashcode is a different one for every instance, removing the GetHashCode method or adding the Equals method has no effect.

However, I can make the last 3 failing tests cited above work by correcting a bug in ValueHashSet<T>.GetHashCode. The current definition is this:


        public override int GetHashCode()
        {
            int h = 0;
            var i = GetEnumerator();
            while (i.MoveNext())
            {
                T obj = i.Current;
                if (!EqualityComparer<T>.Default.Equals(obj, default(T)))
                {
                    h = HashHelpers.CombineHashCodes(h, obj.GetHashCode());
                }
            }
            return h;
        }


This definition is wrong, since it relies on the incorrect assumption that sets containing identical elements will enumerate them in identical order. There is no order defined in a HashSet<T>, and if you have 2 sets to which you add items a after b in one, and b after a in another, the sets are identical but their enumerators will not necessarily be. The operation HashHelpers.CombineHashCodes(h, obj.GetHashCode()) is noncommutative, causing equal set to generate different hashcodes. Using:

        public override int GetHashCode()
        {
            int h = 0;
            foreach(var obj in this)
            {
                if (!EqualityComparer<T>.Default.Equals(obj, default(T)))
                {
                                      h += obj.GetHashCode();
                }
            }
            return h;
        }


Will make the last 3 failing tests work correct. But not the original one!



More weirdness, I suppose.

Vincent

RE: TestAgainstBrzozowski weirdness

Posted by Shad Storhaug <sh...@shadstorhaug.com>.
Vincent,

I put the GetHashCode() fix into both ValueList and ValueHashSet. You unwittingly reverted it back to its original Java state.

Thanks,
Shad Storhaug (NightOwl888)

From: Van Den Berghe, Vincent [mailto:Vincent.VanDenBerghe@bvdinfo.com]
Sent: Thursday, March 23, 2017 9:57 PM
To: dev@lucenenet.apache.org
Cc: Shad Storhaug
Subject: TestAgainstBrzozowski weirdness

Here's a fun fact about TestAgainstBrzozowski.
This is the original test which fails every time:

                Automaton a = AutomatonTestUtil.RandomAutomaton(Random());
                AutomatonTestUtil.MinimizeSimple(a);
                Automaton b = (Automaton)a.Clone();
                MinimizationOperations.Minimize(b);
                Assert.IsTrue(BasicOperations.SameLanguage(a, b));
                Assert.AreEqual(a.GetNumberOfStates(), b.GetNumberOfStates());
                Assert.AreEqual(a.GetNumberOfTransitions(), b.GetNumberOfTransitions());

If we change the call to AutomatonTestUtil.MinimizeSimple(a) by MinimizationOperations.Minimize(a), the test succeeds:

                Automaton a = AutomatonTestUtil.RandomAutomaton(Random());
                MinimizationOperations.Minimize(a);
                Automaton b = (Automaton)a.Clone();
                MinimizationOperations.Minimize(b);
                Assert.IsTrue(BasicOperations.SameLanguage(a, b));
                Assert.AreEqual(a.GetNumberOfStates(), b.GetNumberOfStates());
                Assert.AreEqual(a.GetNumberOfTransitions(), b.GetNumberOfTransitions());

Before you say "big deal", if we replace the call to MinimizationOperations.Minimize(b) by AutomatonTestUtil.MinimizeSimple(b), the test fails every time:

                Automaton a = AutomatonTestUtil.RandomAutomaton(Random());
                AutomatonTestUtil.MinimizeSimple(a);
                Automaton b = (Automaton)a.Clone();
                AutomatonTestUtil.MinimizeSimple(b);
                Assert.IsTrue(BasicOperations.SameLanguage(a, b));
                Assert.AreEqual(a.GetNumberOfStates(), b.GetNumberOfStates());
                Assert.AreEqual(a.GetNumberOfTransitions(), b.GetNumberOfTransitions());

Not only that, but if we re-order the clone operation to be executed on the un-minimized automaton (to make sure we really have 2 identical automatons) like this:

                Automaton a = AutomatonTestUtil.RandomAutomaton(Random());
                Automaton b = (Automaton)a.Clone();
                AutomatonTestUtil.MinimizeSimple(a);
                AutomatonTestUtil.MinimizeSimple(b);
                Assert.IsTrue(BasicOperations.SameLanguage(a, b));
                Assert.AreEqual(a.GetNumberOfStates(), b.GetNumberOfStates());
                Assert.AreEqual(a.GetNumberOfTransitions(), b.GetNumberOfTransitions());

... the test fails as well. This is contrary to what is claimed in the big giant comment.
To make sure that cloning works, this:

                Automaton a = AutomatonTestUtil.RandomAutomaton(Random());
                Automaton b = (Automaton)a.Clone();
                Assert.IsTrue(BasicOperations.SameLanguage(a, b));
                Assert.AreEqual(a.GetNumberOfStates(), b.GetNumberOfStates());
                Assert.AreEqual(a.GetNumberOfTransitions(), b.GetNumberOfTransitions());
                AutomatonTestUtil.MinimizeSimple(a);
                AutomatonTestUtil.MinimizeSimple(b);
                Assert.IsTrue(BasicOperations.SameLanguage(a, b));
                Assert.AreEqual(a.GetNumberOfStates(), b.GetNumberOfStates());
                Assert.AreEqual(a.GetNumberOfTransitions(), b.GetNumberOfTransitions());

... never fails in the first 3 assertions. Always fails in one of the last ones.

Preliminary conclusion:  if AutomatonTestUtil.MinimizeSimple can't give the same results for identical automatons, it's not a good basis for a test.
As to why... I searched for random factors in the AutomatonTestUtil.MinimizeSimple, but found none.  It looks like the algorithm is deterministic.
The only strange thing is the GetHashCode() implementation on State: it's bad form to define a hash code without an Equals method, especially if you're putting States in hash sets or dictionaries. But it seems like reference comparison is all that's needed, and since the hashcode is a different one for every instance, removing the GetHashCode method or adding the Equals method has no effect.

However, I can make the last 3 failing tests cited above work by correcting a bug in ValueHashSet<T>.GetHashCode. The current definition is this:


        public override int GetHashCode()
        {
            int h = 0;
            var i = GetEnumerator();
            while (i.MoveNext())
            {
                T obj = i.Current;
                if (!EqualityComparer<T>.Default.Equals(obj, default(T)))
                {
                    h = HashHelpers.CombineHashCodes(h, obj.GetHashCode());
                }
            }
            return h;
        }


This definition is wrong, since it relies on the incorrect assumption that sets containing identical elements will enumerate them in identical order. There is no order defined in a HashSet<T>, and if you have 2 sets to which you add items a after b in one, and b after a in another, the sets are identical but their enumerators will not necessarily be. The operation HashHelpers.CombineHashCodes(h, obj.GetHashCode()) is noncommutative, causing equal set to generate different hashcodes. Using:

        public override int GetHashCode()
        {
            int h = 0;
            foreach(var obj in this)
            {
                if (!EqualityComparer<T>.Default.Equals(obj, default(T)))
                {
                                      h += obj.GetHashCode();
                }
            }
            return h;
        }


Will make the last 3 failing tests work correct. But not the original one!



More weirdness, I suppose.

Vincent