You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@spamassassin.apache.org on 2022/08/25 23:46:45 UTC

[Bug 8033] New: t/bayessql.t test failure

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8033

            Bug ID: 8033
           Summary: t/bayessql.t test failure
           Product: Spamassassin
           Version: 4.0.0
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: blocker
          Priority: P2
         Component: Regression Tests
          Assignee: dev@spamassassin.apache.org
          Reporter: sidney@sidney.com
  Target Milestone: Undefined

The following five links show test failures in t/bayessql.t on some Linux CPAN
test machines run on the 4.0.0rc1 upload. Note that I uploaded a copy of rc1
named Mail-SpamAssassin-4.0.0-pre2y-TRIAL for testing before the rc1 release.
Both that name and Mail-SpamAssassin-4.0.0-rc1-TRIAL are represented in these
reports, but they all are tests of the rc1 code.

All of these test machines are run by the same tester, who has quite a few
other test machines that don't get this failure.

If anyone can glean anything from the error messages that these show, or can
see something from the environment sections of the reports that these
particular test machines have in common that might be a clue as to why this
particular test failure happens on them but not elsewhere, please comment.

Links to test reports:

http://www.cpantesters.org/cpan/report/46f6a002-2435-11ed-91a3-cb266e8775ea
http://www.cpantesters.org/cpan/report/49aea65a-2435-11ed-91a3-cb266e8775ea
http://www.cpantesters.org/cpan/report/379a3dbc-2435-11ed-91a3-cb266e8775ea
http://www.cpantesters.org/cpan/report/44305570-2435-11ed-91a3-cb266e8775ea
http://www.cpantesters.org/cpan/report/3fab2c1e-2435-11ed-91a3-cb266e8775ea

Links to test matrix page for each of these uploads, showing ongoing test
results:

http://matrix.cpantesters.org/?dist=Mail-SpamAssassin%204.0.0-pre2y-TRIAL
http://matrix.cpantesters.org/?dist=Mail-SpamAssassin%204.0.0-rc1-TRIAL

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8033] t/bayessql.t test failure

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8033

--- Comment #15 from Sidney Markowitz <si...@sidney.com> ---
I don't really want to add it everywhere. Looking at all the pragmas that are
available in SQLite, none of them seem like they would be useful. Synchronous
off only works if you are sure of no simultaneous access, which only applies to
having it in a single test like this. I'm ok with leaving it as an undocumented
hidden feature that is only used by a test, adding a comment to that effect
where the code is in SQL.pm.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8033] t/bayessql.t test failure

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8033

Sidney Markowitz <si...@sidney.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |sidney@sidney.com
   Target Milestone|Undefined                   |4.0.0

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8033] t/bayessql.t test failure

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8033

Henrik Krohns <ap...@hege.li> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |apache@hege.li

--- Comment #7 from Henrik Krohns <ap...@hege.li> ---
Created attachment 5809
  --> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5809&action=edit
pragma support patch

I realized my mistake. It's not enough to set pragma only in bayessql.t as it's
not persistent

$dbh->do("PRAGMA synchronous = OFF");

BayesStore/SQL.pm opens up the connections, so it must be set there too.

I propose the following patch which will process PRAGMA clauses from DSN
attributes

dbi:SQLite:dbname=$dbdir/bayes.db;synchronous=OFF

I think it's future proof, atleast adding random stuff like ;foo=bar does not
break anything.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8033] t/bayessql.t test failure

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8033

--- Comment #16 from Henrik Krohns <ap...@hege.li> ---
(In reply to Sidney Markowitz from comment #15)
> Synchronous off only works if you are sure of no simultaneous access, which
> only applies to having it in a single test like this.

It has nothing to do with simultaneous access or prevent it. It just disables
fsync() and may corrupt if server crashes.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8033] t/bayessql.t test failure

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8033

--- Comment #13 from Sidney Markowitz <si...@sidney.com> ---
Created attachment 5811
  --> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5811&action=edit
pragma support patch v3

This puts the pragmas in a separate option so dsn is not in a non-standard
format and the option is properly documented in perldoc.

Henrik, what do you think?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8033] t/bayessql.t test failure

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8033

Sidney Markowitz <si...@sidney.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #6 from Sidney Markowitz <si...@sidney.com> ---
The test results came back, with seven runs in the set this time showing the
failures. On all of them, I logged the creation of the temp directory, the
calls to sa-learn that used it, and tested for existence of the directory at
various points in sa-learn. The directory disappeared between 2 and 6 seconds
after creation, all after variously between one to five calls to sa-learn
completed successfully. On all of those test runs, a copy of bayessql.t that
was identical except for the temp directory being on disk instead of /dev/shm
passed all tests.

I still don't know what those machines have in common that make them part of a
set and have that failure, but it doesn't seem worth figuring out just to speed
up this test on machines that have a /dev/shm.

trunk % svn ci -m "bug 8033 - Remove use of /dev/shm to speed up test because
that causes test failure on some machines. Label test as long running"
t/bayessql.t 
Sending        t/bayessql.t
Transmitting file data .done
Committing transaction...
Committed revision 1903870.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8033] [REVIEW] t/bayessql.t test failure

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8033

Sidney Markowitz <si...@sidney.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|REOPENED                    |RESOLVED

--- Comment #21 from Sidney Markowitz <si...@sidney.com> ---
(In reply to Kevin A. McGrail from comment #20)
> I did not test this nor would I know where to start.

Installing DBD::SQLite on a test machine allows t/bayessql.t to run, but with
72 successful tests of the patch on 37 environments so far, run on CPAN, I
don't a see a need for you to set up for it.

trunk % svn ci -m "Bug 8033 - Add PRAGMA to SQLite test to speed test without
unreliable use of /dev/shm"
Sending        lib/Mail/SpamAssassin/BayesStore/SQL.pm
Sending        t/bayessql.t
Transmitting file data ..done
Committing transaction...
Committed revision 1903917.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8033] t/bayessql.t test failure

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8033

--- Comment #10 from Sidney Markowitz <si...@sidney.com> ---
(In reply to Henrik Krohns from comment #7)
> dbi:SQLite:dbname=$dbdir/bayes.db;synchronous=OFF

This is so cool! In my tests it is exactly as fast as using /dev/shm, which
makes sense as everything I read as to why people put SQLite databases on
/dev/shm for speed blamed fsync on the disk for that being so slow.

I'll be in a bit less of a hurry to commit this: I'll upload a trial to CPAN
that contains this patch and the two bugs that are currently in [REVIEW] status
(reminder: they each need another vote or two) and see what all those test
machines have to say over the next few days.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8033] t/bayessql.t test failure

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8033

--- Comment #14 from Henrik Krohns <ap...@hege.li> ---
+1 for the effort, though optimally we would want to support it also on
SQLBasedAddrList.pm and DecodeShortURLs.pm. Seems a bit waste to make separate
config clause for all, given that SQLite is probably uncommon too, but if you
want to tackle it..

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8033] t/bayessql.t test failure

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8033

--- Comment #11 from Kevin A. McGrail <km...@apache.org> ---
Really good sleuthing.  Awaiting results from CPAN!

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8033] t/bayessql.t test failure

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8033

Kevin A. McGrail <km...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kmcgrail@apache.org

--- Comment #1 from Kevin A. McGrail <km...@apache.org> ---
Interesting.  Can you work with the tester to get the output files referenced
and see if the can do anything more specific with -D turned on for more output.
 I see nothing that jumps out in the reports though.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8033] t/bayessql.t test failure

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8033

--- Comment #2 from Sidney Markowitz <si...@sidney.com> ---
(In reply to Kevin A. McGrail from comment #1)
> Interesting.  Can you work with the tester to get the output files
> referenced

Difficult communicating with the tester. I tried that with this specific tester
once before with never a reply.

I have been uploading test builds with diagnostic output going to STDERR so it
shows up in the test reports. Each time I do takes about 24 hours before that
particular test machine gets to it and I can see the results. Makes me
nostalgic for my first programming days submitting punch card decks for an
overnight run on the mainframe. At least this time I can run on my machine to
get immediate feedback on typos before I submit it :)

I have it narrowed down to the second call to salearnrun getting that first die
error and the sqlite db file being deleted sometime during that call. Sometime
tomorrow I should find out what can be found by adding -D learn,sa-learn,bayes
to that one call to sa-learn that is dying. There would be more output than I
want if I tried blasting out a -D all, given that this test will be run by
hundreds of test machines other than just the one or two that get the error, so
I'm trying just those three -D channels first.

At least it isn't a Heisenbug - The error is consistent on the couple of
machines on which it happens.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8033] [REVIEW] t/bayessql.t test failure

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8033

Sidney Markowitz <si...@sidney.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|t/bayessql.t test failure   |[REVIEW] t/bayessql.t test
                   |                            |failure

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8033] t/bayessql.t test failure

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8033

Henrik Krohns <ap...@hege.li> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #5809|0                           |1
        is obsolete|                            |

--- Comment #9 from Henrik Krohns <ap...@hege.li> ---
Created attachment 5810
  --> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5810&action=edit
pragma support patch v2

just a slightly leaner patch, no need to split attributes

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8033] t/bayessql.t test failure

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8033

--- Comment #17 from Sidney Markowitz <si...@sidney.com> ---
(In reply to Henrik Krohns from comment #16)
> may corrupt if server crashes.

So it still would not be a good idea to use for a production system? That
sounds like a reason not to bother implementing it for use other than for
testing. Or maybe I'm rationalizing not going to the effort to implement it for
all the different uses of SQL. Since you wrote it this way, I guess you would
agree to keeping it the way you wrote it as an undocumented extension, and only
use it in the test?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8033] [REVIEW] t/bayessql.t test failure

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8033

--- Comment #20 from Kevin A. McGrail <km...@apache.org> ---
I love undocumented extensions.  The patch looks clean but I don't use SQL Lite
to test this.  +1 but only from a review of patch v4.  I did not test this nor
would I know where to start.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8033] t/bayessql.t test failure

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8033

--- Comment #19 from Sidney Markowitz <si...@sidney.com> ---
Created attachment 5812
  --> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5812&action=edit
pragma support patch v4

Here is a patch that is the same as v2, except diffed against the current trunk
and with a change to a couple of comments to better document what it is doing.

It has passed tests on enough CPAN test machines that I'm calling for a review
and vote now. This patch, except for the change in comments, is in the version
in this test matrix page:

http://matrix.cpantesters.org/?dist=Mail-SpamAssassin%204.0.0-rc1m-TRIAL

The Solaris test failure there is what bug 8040 is about, not relevant for this
issue.

+1 from me, and I assume +1 from Henrik, since it's his patch, so one more to
go.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8033] t/bayessql.t test failure

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8033

Henrik Krohns <ap...@hege.li> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|FIXED                       |---
             Status|RESOLVED                    |REOPENED

--- Comment #8 from Henrik Krohns <ap...@hege.li> ---
Reopening as Sidney was too fast. :-)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8033] t/bayessql.t test failure

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8033

--- Comment #12 from Sidney Markowitz <si...@sidney.com> ---
Since this now has a change in SQL.pm, not just in test, it will have to be
voted on for RTC mode before we get to committing it.

Henrik, what do you think about putting the PRAGMA in a new config option
instead of adding it to the end of the dsn as a non-standard syntax? That way
it would be documented in the POD instead of being an undocumented extra syntax
for bayes_sql_dsn used only in a test. Maybe call it bayes_sql_sqlite_pragma?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8033] t/bayessql.t test failure

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8033

--- Comment #18 from Henrik Krohns <ap...@hege.li> ---
(In reply to Sidney Markowitz from comment #17)
> I guess you
> would agree to keeping it the way you wrote it as an undocumented extension,
> and only use it in the test?

Yes. I don't really think anyone would use SQLite in production anyway.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8033] t/bayessql.t test failure

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8033

--- Comment #3 from Sidney Markowitz <si...@sidney.com> ---
The frustrating stage I have gotten to now, and a speculation on a fix:

I'm still refining adding debugging logging to narrow down where it happens,
with 24-48 hour turnaround on results, but I have shown that the temporary
directory created in /dev/shm disappears all of a sudden with the exact place
in the code that it does varying a bit between runs, happening in either the
second or third call to sa-learn

My speculation is that this particular set of test machines have something that
reaps files in /dev/shm too aggressively.

The temporary directory is created using the lines

# Try /dev/shm as it's likely memdisk, otherwise SQLite is sloow..
my $dbdir = tempdir("bayessql.XXXXXX", DIR => -w "/dev/shm" ? "/dev/shm" :
"log");

The comment is correct. On my Ubuntu virtual machine running in VirtualBox on
my MacBook Pro, the test takes about 23 seconds to run using /dev/shm and
2min:30sec if that line is changed to always use "log".

The question I have is whether that difference in time for that test is worth
it if some machines that the test runs on have a /dev/shm that is writable but
is not set up suitably for use for temporary files.

If it turns out that never using /dev/shm for the files allows the test to run
on all the CPAN test machines instead of failing on a few, at the cost of the
test taking two minutes longer to run on machines that have a /dev/shm, is that
the correct fix? (It will take a couple days to get the feedback to see if that
change does cause the failures to go away).

Note that this only has to do with the one test. It has nothing to do with
anything that runs in actual SpamAssassin. By our guidelines I wouldn't even
have to get a vote for RTC. But I am asking for thoughts and comments on this
issue.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8033] t/bayessql.t test failure

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8033

Loren Wilton <lw...@earthlink.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |lwilton@earthlink.net

--- Comment #4 from Loren Wilton <lw...@earthlink.net> ---
What's the time to run all tests? Is 2 minutes a significant fraction of that?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8033] t/bayessql.t test failure

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8033

Loren Wilton <lw...@earthlink.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|lwilton@earthlink.net       |

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8033] t/bayessql.t test failure

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8033

--- Comment #5 from Sidney Markowitz <si...@sidney.com> ---
(In reply to Loren Wilton from comment #4)
> What's the time to run all tests?

On the same VM that measured the 2 minutes difference, the default tests ran in
28 minutes.

If I set the options to not run the 31 tests labeled as "long" instead of the
default, the remaining 183 tests run in 14.5 minutes, for an average of under 5
seconds per "short" test.

That puts bayessql.t in the range of the long tests, even using /dev/shm at 23
seconds compared to the average of under 5 seconds for all the short tests. So
one change should be to label it as long. That would make it still longer than
the average time for the "long" tests, but still, adding 2 minutes to a total
test suite of half an hour doesn't seem like too much to me.

Thanks, I think that puts it in perspective. If this next round of tests I
submitted comes back showing that writing to disk instead of /dev/shm makes it
work on those test machines, I'm inclined to make that change and label it as
one of the long tests.

-- 
You are receiving this mail because:
You are the assignee for the bug.