You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Benny Pedersen <me...@junc.eu> on 2019/12/09 12:28:06 UTC

txrep duplicated key with postgresql


2019-12-09 12:07:53.477 UTC [16458] DETAIL:  Key (username, email, 
signedby, ip)=(user@example.org, user@example.com, example.com, none) 
already exists.
2019-12-09 12:07:53.477 UTC [16458] STATEMENT:  INSERT INTO txrep 
(username,email,ip,count,totscore,signedby) VALUES ($1,$2,$3,$4,$5,$6)
2019-12-09 12:07:53.479 UTC [16459] ERROR:  duplicate key value violates 
unique constraint "txrep_pkey"

--
-- PostgreSQL database dump
--

-- Dumped from database version 11.4
-- Dumped by pg_dump version 11.4

SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET xmloption = content;
SET client_min_messages = warning;
SET row_security = off;

SET default_tablespace = '';

SET default_with_oids = false;

--
-- Name: txrep; Type: TABLE; Schema: public; Owner: spamassassin
--

CREATE TABLE public.txrep (
     username character varying(100) DEFAULT ''::character varying NOT 
NULL,
     email character varying(255) DEFAULT ''::character varying NOT NULL,
     ip character varying(40) DEFAULT ''::character varying NOT NULL,
     count bigint DEFAULT '0'::bigint NOT NULL,
     totscore double precision DEFAULT '0'::double precision NOT NULL,
     signedby character varying(255) DEFAULT ''::character varying NOT 
NULL,
     last_hit timestamp without time zone DEFAULT CURRENT_TIMESTAMP NOT 
NULL
)
WITH (fillfactor='95');


ALTER TABLE public.txrep OWNER TO spamassassin;

--
-- Name: txrep txrep_pkey; Type: CONSTRAINT; Schema: public; Owner: 
spamassassin
--

ALTER TABLE ONLY public.txrep
     ADD CONSTRAINT txrep_pkey PRIMARY KEY (username, email, signedby, 
ip);


--
-- Name: txrep_last_hit; Type: INDEX; Schema: public; Owner: 
spamassassin
--

CREATE INDEX txrep_last_hit ON public.txrep USING btree (last_hit);


--
-- Name: txrep update_txrep_update_last_hit; Type: TRIGGER; Schema: 
public; Owner: spamassassin
--

CREATE TRIGGER update_txrep_update_last_hit BEFORE UPDATE ON 
public.txrep FOR EACH ROW EXECUTE PROCEDURE 
public.update_txrep_last_hit();


--
-- PostgreSQL database dump complete
--


how to solve this ?

Re: txrep duplicated key with postgresql

Posted by Benny Pedersen <me...@mx.junc.eu>.
On 2019-12-09 22:52, Daniel J. Luke wrote:
> I uploaded a patch for postgresql on
> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7218 a while ago -
> but I haven't had time to clean it up into something that should be
> included into a release.

added this patch, seem to solve it

> It might serve as inspiration for someone else before I end up having
> time to get to it, though.

indeed

now i just have to figure out why spamd gives diffrent results for dkim 
when fuglu runs in postfix prequeue setup, mybe mimedefang just works :/

hope for more begin using fuglu then thinks about better names for 
mimedefang with less wiki docs

Re: txrep duplicated key with postgresql

Posted by "Daniel J. Luke" <dl...@geeklair.net>.
I uploaded a patch for postgresql on https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7218 a while ago - but I haven't had time to clean it up into something that should be included into a release.

It might serve as inspiration for someone else before I end up having time to get to it, though.

> On Dec 9, 2019, at 4:00 PM, Martin Gregorie <ma...@gregorie.org> wrote:
> 
> On Mon, 2019-12-09 at 11:41 -0800, John Hardin wrote:
>> This sounds more like the "does that tuple already exist?" logic is 
>> failing, causing it to think it needs to create a new entry, which
>> the unique key is (correctly) preventing.
>> 
>> You don't lightly bypass unique keys. They are there for a reason.
>> 
> Fair enough. Since this is the first reference I remember seeing to
> using PostgreSQL with TxRef I assumed that Benny's cry for help was due
> to a difference in the way it handled duplicate keys compared with the
> database that normally supports it.
> 
> Martin
> 

-- 
Daniel J. Luke


Re: txrep duplicated key with postgresql

Posted by Martin Gregorie <ma...@gregorie.org>.
On Mon, 2019-12-09 at 11:41 -0800, John Hardin wrote:
> This sounds more like the "does that tuple already exist?" logic is 
> failing, causing it to think it needs to create a new entry, which
> the unique key is (correctly) preventing.
> 
> You don't lightly bypass unique keys. They are there for a reason.
> 
Fair enough. Since this is the first reference I remember seeing to
using PostgreSQL with TxRef I assumed that Benny's cry for help was due
to a difference in the way it handled duplicate keys compared with the
database that normally supports it.

Martin



Re: txrep duplicated key with postgresql

Posted by John Hardin <jh...@impsec.org>.
On Mon, 9 Dec 2019, RW wrote:

> On Mon, 09 Dec 2019 13:14:45 +0000
> Martin Gregorie wrote:
>
>> The primary key for the public.txrep table must be unique, and
>> evidently you already had a row with the same primary key. It seems
>> likely that the combination [username, email, signedby and ip] will
>> very often be duplicated, like every time you get another email from
>> that person.
>
>
> TxRep reuses a lot of AWL, often leaving things mislabelled. For per
> message tracking entries I think 'email' would be a message identifier
> - if such a row already exists it ought be handled. There's no reason
> for duplicate reputation entries.
>
>> Try this:
>
> Unless you know of a good reason for having duplicates, making it
> possible will just conceal a bug.

It's a reputation rating. I'd presume that each tuple *should* only have 
one entry, updated with more stats (message count, total score, etc.) 
once it's created.

This sounds more like the "does that tuple already exist?" logic is 
failing, causing it to think it needs to create a new entry, which the 
unique key is (correctly) preventing.

You don't lightly bypass unique keys. They are there for a reason.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Any time law enforcement becomes a revenue center, the system
   becomes corrupt.
-----------------------------------------------------------------------
  6 days until Bill of Rights day

Re: txrep duplicated key with postgresql

Posted by RW <rw...@googlemail.com>.
On Mon, 09 Dec 2019 13:14:45 +0000
Martin Gregorie wrote:

> The primary key for the public.txrep table must be unique, and
> evidently you already had a row with the same primary key. It seems
> likely that the combination [username, email, signedby and ip] will
> very often be duplicated, like every time you get another email from
> that person.


TxRep reuses a lot of AWL, often leaving things mislabelled. For per
message tracking entries I think 'email' would be a message identifier
- if such a row already exists it ought be handled. There's no reason
for duplicate reputation entries.

> Try this:

Unless you know of a good reason for having duplicates, making it
possible will just conceal a bug.


Re: txrep duplicated key with postgresql

Posted by Martin Gregorie <ma...@gregorie.org>.
The primary key for the public.txrep table must be unique, and evidently
you already had a row with the same primary key. It seems likely that
the combination [username, email, signedby and ip] will very often be
duplicated, like every time you get another email from that person.

Try this:
- redefine txrep_pkey as a data retrieval index (dups allowed)
- use last_hit as the primary key. This should work provided that
  CURRENT_TIMESTAMP ticks faster than new rows can arrive. (this may be
  hardware dependent).
- Otherwise, create a sequence object and use that as the source of
  primary key values. Using it this way will generate primary keys in
  data arrival sequence and will not return duplicate values. 

Martin


On Mon, 2019-12-09 at 13:28 +0100, Benny Pedersen wrote:
> 
> 2019-12-09 12:07:53.477 UTC [16458] DETAIL:  Key (username, email, 
> signedby, ip)=(user@example.org, user@example.com, example.com, none) 
> already exists.
> 2019-12-09 12:07:53.477 UTC [16458] STATEMENT:  INSERT INTO txrep 
> (username,email,ip,count,totscore,signedby) VALUES ($1,$2,$3,$4,$5,$6)
> 2019-12-09 12:07:53.479 UTC [16459] ERROR:  duplicate key value
> violates 
> unique constraint "txrep_pkey"
> 
> --
> -- PostgreSQL database dump
> --
> 
> -- Dumped from database version 11.4
> -- Dumped by pg_dump version 11.4
> 
> SET statement_timeout = 0;
> SET lock_timeout = 0;
> SET idle_in_transaction_session_timeout = 0;
> SET client_encoding = 'UTF8';
> SET standard_conforming_strings = on;
> SELECT pg_catalog.set_config('search_path', '', false);
> SET check_function_bodies = false;
> SET xmloption = content;
> SET client_min_messages = warning;
> SET row_security = off;
> 
> SET default_tablespace = '';
> 
> SET default_with_oids = false;
> 
> --
> -- Name: txrep; Type: TABLE; Schema: public; Owner: spamassassin
> --
> 
> CREATE TABLE public.txrep (
>      username character varying(100) DEFAULT ''::character varying
> NOT 
> NULL,
>      email character varying(255) DEFAULT ''::character varying NOT
> NULL,
>      ip character varying(40) DEFAULT ''::character varying NOT NULL,
>      count bigint DEFAULT '0'::bigint NOT NULL,
>      totscore double precision DEFAULT '0'::double precision NOT NULL,
>      signedby character varying(255) DEFAULT ''::character varying
> NOT 
> NULL,
>      last_hit timestamp without time zone DEFAULT CURRENT_TIMESTAMP
> NOT 
> NULL
> )
> WITH (fillfactor='95');
> 
> 
> ALTER TABLE public.txrep OWNER TO spamassassin;
> 
> --
> -- Name: txrep txrep_pkey; Type: CONSTRAINT; Schema: public; Owner: 
> spamassassin
> --
> 
> ALTER TABLE ONLY public.txrep
>      ADD CONSTRAINT txrep_pkey PRIMARY KEY (username, email,
> signedby, 
> ip);
> 
> 
> --
> -- Name: txrep_last_hit; Type: INDEX; Schema: public; Owner: 
> spamassassin
> --
> 
> CREATE INDEX txrep_last_hit ON public.txrep USING btree (last_hit);
> 
> 
> --
> -- Name: txrep update_txrep_update_last_hit; Type: TRIGGER; Schema: 
> public; Owner: spamassassin
> --
> 
> CREATE TRIGGER update_txrep_update_last_hit BEFORE UPDATE ON 
> public.txrep FOR EACH ROW EXECUTE PROCEDURE 
> public.update_txrep_last_hit();
> 
> 
> --
> -- PostgreSQL database dump complete
> --
> 
> 
> how to solve this ?