You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Alex <my...@gmail.com> on 2018/09/14 17:36:29 UTC

DNS and RBL problems

Hi,

For the past few weeks I've been having problems with queries to many
of the common RBLs, including barracuda, mailspike and unsubscore. My
logs are filled with "Name service error", SERVFAIL and lame-server
messages for RBLs I know to be valid.

14-Sep-2018 12:21:10.928 query-errors: info: client @0x7f105735f3b0
127.0.0.1#44791 (139.33.47.104.bl.mailspike.net): query failed
(SERVFAIL) for 139.33.47.104.bl.mailspike.net/IN/A at
../../../bin/named/query.c:8580
14-Sep-2018 12:21:10.928 query-errors: info: client @0x7f10342d4650
127.0.0.1#44791 (139.33.47.104.db.wpbl.info): query failed (SERVFAIL)
for 139.33.47.104.db.wpbl.info/IN/A at ../../../bin/named/query.c:8580
14-Sep-2018 12:21:10.928 query-errors: debug 2: fetch completed at
../../../lib/dns/resolver.c:3927 for 139.33.47.104.bl.mailspike.net/A
in 30.000146: timed out/success
[domain:bl.mailspike.net,referral:0,restart:5,qrysent:14,timeout:13,lame:0,quota:0,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0]

This shows a failure while other times these same queries succeed.

This is using bind set up as a standard recursive name server on
fedora28. These are bind logs, but does anyone know why spamassassin
queries to these RBLs would timeout? There's no firewall involved. It
appears to happen at all times during the day.

I really have no other ideas after staring at the logs for weeks,
seeing it happen on all my systems, and asking on numerous other lists
(including postfix and bind-users).

Re: DNS and RBL problems

Posted by Pedro David Marco <pe...@yahoo.com>.
 Alex, 
if you want i can give you an temporal SSH tunnel for DNS traffic so you can discard a Optonline/Cablevision/Altice problem...
Regards!
--------PedroD.

    On Saturday, September 15, 2018, 6:42:07 PM GMT+2, Axb <ax...@gmail.com> wrote:  
 
 So this is the moment where this becomes SA OT and your ISP or 
networking guys/support & Wireshark / hping, etc should help you out.


On 9/15/18 6:28 PM, Alex wrote:
> Hi,
> 
> On Sat, Sep 15, 2018 at 5:31 AM Benny Pedersen <me...@junc.eu> wrote:
>>
>> Pedro David Marco skrev den 2018-09-15 09:46:
>>> Sorry, typo issue.. i meant 512 bytes
>>
>> and with EDNS0 its upto 4096
>>
>> but not all dns servers support it
>>
>> one could force tcp if wanted
>>
>> or drop buggy rbl zones
> 
> Thank you all so much for your help. The only thing between this
> system and the Internet is the Optonline modem/router. I've even gone
> without any local firewall rules to eliminate that possibility.
> 
> Just last night I implemented htb shaping to limit the outgoing SMTP
> traffic rate to be sure it's not consuming the entire pipe, preventing
> UDP traffic from being received. I don't think that's the problem,
> though, as it happens during all times of the day.
> 
>> zone "hostkarma.junkemailfilter.com" { type forward; forward first;
>> forwarders {}; };
> 
> I'm not sure this would help, as our nameservers aren't set up for
> forwarding at all.
> 
>> Can you place a sniffer on LAN and WAN interfaces of your Firewall?
> 
> I've done this, and even posted packets for people to look at on the
> bind-users list, and it was inconclusive. The packet involving the
> "SERVFAIL" error doesn't provide any info as to why it failed. It
> appears there was just never a response to the packet and the query
> timed out.
> 
>> Just in case of unexpected throttling by someone/something in the middle... have you tried with a VPN (only for DNS traffic)?
> 
> I'll try that to see if somehow Optonline/Cablevision/Altice is
> dropping my packets. However, it does also happen to our DIA ethernet
> circuit, so I'm not hopeful.
> 
> Here's the packet trace of one of the failed packets, in case someone
> has some ideas or was curious.
> 
> No.    Time          Source                Destination
> Protocol Length Info
>    9083 11.730327      127.0.0.1            127.0.0.1            DNS
>      104    Standard query response 0xded6 Server failure A
> 25.188.223.216.wl.mailspike.net OPT
> 
> Frame 9083: 104 bytes on wire (832 bits), 104 bytes captured (832 bits)
>      Encapsulation type: Linux cooked-mode capture (25)
>      Arrival Time: Sep 13, 2018 15:46:36.633305000 EDT
>      [Time shift for this packet: 0.000000000 seconds]
>      Epoch Time: 1536867996.633305000 seconds
>      [Time delta from previous captured frame: 0.000969000 seconds]
>      [Time delta from previous displayed frame: 0.006367000 seconds]
>      [Time since reference or first frame: 11.730327000 seconds]
>      Frame Number: 9083
>      Frame Length: 104 bytes (832 bits)
>      Capture Length: 104 bytes (832 bits)
>      [Frame is marked: False]
>      [Frame is ignored: False]
>      [Protocols in frame: sll:ethertype:ip:udp:dns]
>      [Coloring Rule Name: UDP]
>      [Coloring Rule String: udp]
> Linux cooked capture
>      Packet type: Unicast to us (0)
>      Link-layer address type: 772
>      Link-layer address length: 6
>      Source: 00:00:00_00:00:00 (00:00:00:00:00:00)
>      Unused: 6fc0
>      Protocol: IPv4 (0x0800)
> Internet Protocol Version 4, Src: 127.0.0.1, Dst: 127.0.0.1
>      0100 .... = Version: 4
>      .... 0101 = Header Length: 20 bytes (5)
>      Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
>          0000 00.. = Differentiated Services Codepoint: Default (0)
>          .... ..00 = Explicit Congestion Notification: Not ECN-Capable
> Transport (0)
>      Total Length: 88
>      Identification: 0x2dff (11775)
>      Flags: 0x0000
>          0... .... .... .... = Reserved bit: Not set
>          .0.. .... .... .... = Don't fragment: Not set
>          ..0. .... .... .... = More fragments: Not set
>          ...0 0000 0000 0000 = Fragment offset: 0
>      Time to live: 64
>      Protocol: UDP (17)
>      Header checksum: 0x4e94 [validation disabled]
>      [Header checksum status: Unverified]
>      Source: 127.0.0.1
>      Destination: 127.0.0.1
> User Datagram Protocol, Src Port: 53, Dst Port: 12304
>      Source Port: 53
>      Destination Port: 12304
>      Length: 68
>      Checksum: 0xfe57 [unverified]
>      [Checksum Status: Unverified]
>      [Stream index: 320]
> Domain Name System (response)
>      Transaction ID: 0xded6
>      Flags: 0x8182 Standard query response, Server failure
>          1... .... .... .... = Response: Message is a response
>          .000 0... .... .... = Opcode: Standard query (0)
>          .... .0.. .... .... = Authoritative: Server is not an
> authority for domain
>          .... ..0. .... .... = Truncated: Message is not truncated
>          .... ...1 .... .... = Recursion desired: Do query recursively
>          .... .... 1... .... = Recursion available: Server can do
> recursive queries
>          .... .... .0.. .... = Z: reserved (0)
>          .... .... ..0. .... = Answer authenticated: Answer/authority
> portion was not authenticated by the server
>          .... .... ...0 .... = Non-authenticated data: Unacceptable
>          .... .... .... 0010 = Reply code: Server failure (2)
>      Questions: 1
>      Answer RRs: 0
>      Authority RRs: 0
>      Additional RRs: 1
>      Queries
>          25.188.223.216.wl.mailspike.net: type A, class IN
>              Name: 25.188.223.216.wl.mailspike.net
>              [Name Length: 31]
>              [Label Count: 7]
>              Type: A (Host Address) (1)
>              Class: IN (0x0001)
>      Additional records
>          <Root>: type OPT
>              Name: <Root>
>              Type: OPT (41)
>              UDP payload size: 4096
>              Higher bits in extended RCODE: 0x00
>              EDNS0 version: 0
>              Z: 0x0000
>                  0... .... .... .... = DO bit: Cannot handle DNSSEC security RRs
>                  .000 0000 0000 0000 = Reserved: 0x0000
>              Data length: 0
>      [Unsolicited: True]
> 

  

Re: DNS and RBL problems

Posted by Axb <ax...@gmail.com>.
So this is the moment where this becomes SA OT and your ISP or 
networking guys/support & Wireshark / hping, etc should help you out.


On 9/15/18 6:28 PM, Alex wrote:
> Hi,
> 
> On Sat, Sep 15, 2018 at 5:31 AM Benny Pedersen <me...@junc.eu> wrote:
>>
>> Pedro David Marco skrev den 2018-09-15 09:46:
>>> Sorry, typo issue.. i meant 512 bytes
>>
>> and with EDNS0 its upto 4096
>>
>> but not all dns servers support it
>>
>> one could force tcp if wanted
>>
>> or drop buggy rbl zones
> 
> Thank you all so much for your help. The only thing between this
> system and the Internet is the Optonline modem/router. I've even gone
> without any local firewall rules to eliminate that possibility.
> 
> Just last night I implemented htb shaping to limit the outgoing SMTP
> traffic rate to be sure it's not consuming the entire pipe, preventing
> UDP traffic from being received. I don't think that's the problem,
> though, as it happens during all times of the day.
> 
>> zone "hostkarma.junkemailfilter.com" { type forward; forward first;
>> forwarders {}; };
> 
> I'm not sure this would help, as our nameservers aren't set up for
> forwarding at all.
> 
>> Can you place a sniffer on LAN and WAN interfaces of your Firewall?
> 
> I've done this, and even posted packets for people to look at on the
> bind-users list, and it was inconclusive. The packet involving the
> "SERVFAIL" error doesn't provide any info as to why it failed. It
> appears there was just never a response to the packet and the query
> timed out.
> 
>> Just in case of unexpected throttling by someone/something in the middle... have you tried with a VPN (only for DNS traffic)?
> 
> I'll try that to see if somehow Optonline/Cablevision/Altice is
> dropping my packets. However, it does also happen to our DIA ethernet
> circuit, so I'm not hopeful.
> 
> Here's the packet trace of one of the failed packets, in case someone
> has some ideas or was curious.
> 
> No.     Time           Source                Destination
> Protocol Length Info
>     9083 11.730327      127.0.0.1             127.0.0.1             DNS
>       104    Standard query response 0xded6 Server failure A
> 25.188.223.216.wl.mailspike.net OPT
> 
> Frame 9083: 104 bytes on wire (832 bits), 104 bytes captured (832 bits)
>      Encapsulation type: Linux cooked-mode capture (25)
>      Arrival Time: Sep 13, 2018 15:46:36.633305000 EDT
>      [Time shift for this packet: 0.000000000 seconds]
>      Epoch Time: 1536867996.633305000 seconds
>      [Time delta from previous captured frame: 0.000969000 seconds]
>      [Time delta from previous displayed frame: 0.006367000 seconds]
>      [Time since reference or first frame: 11.730327000 seconds]
>      Frame Number: 9083
>      Frame Length: 104 bytes (832 bits)
>      Capture Length: 104 bytes (832 bits)
>      [Frame is marked: False]
>      [Frame is ignored: False]
>      [Protocols in frame: sll:ethertype:ip:udp:dns]
>      [Coloring Rule Name: UDP]
>      [Coloring Rule String: udp]
> Linux cooked capture
>      Packet type: Unicast to us (0)
>      Link-layer address type: 772
>      Link-layer address length: 6
>      Source: 00:00:00_00:00:00 (00:00:00:00:00:00)
>      Unused: 6fc0
>      Protocol: IPv4 (0x0800)
> Internet Protocol Version 4, Src: 127.0.0.1, Dst: 127.0.0.1
>      0100 .... = Version: 4
>      .... 0101 = Header Length: 20 bytes (5)
>      Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
>          0000 00.. = Differentiated Services Codepoint: Default (0)
>          .... ..00 = Explicit Congestion Notification: Not ECN-Capable
> Transport (0)
>      Total Length: 88
>      Identification: 0x2dff (11775)
>      Flags: 0x0000
>          0... .... .... .... = Reserved bit: Not set
>          .0.. .... .... .... = Don't fragment: Not set
>          ..0. .... .... .... = More fragments: Not set
>          ...0 0000 0000 0000 = Fragment offset: 0
>      Time to live: 64
>      Protocol: UDP (17)
>      Header checksum: 0x4e94 [validation disabled]
>      [Header checksum status: Unverified]
>      Source: 127.0.0.1
>      Destination: 127.0.0.1
> User Datagram Protocol, Src Port: 53, Dst Port: 12304
>      Source Port: 53
>      Destination Port: 12304
>      Length: 68
>      Checksum: 0xfe57 [unverified]
>      [Checksum Status: Unverified]
>      [Stream index: 320]
> Domain Name System (response)
>      Transaction ID: 0xded6
>      Flags: 0x8182 Standard query response, Server failure
>          1... .... .... .... = Response: Message is a response
>          .000 0... .... .... = Opcode: Standard query (0)
>          .... .0.. .... .... = Authoritative: Server is not an
> authority for domain
>          .... ..0. .... .... = Truncated: Message is not truncated
>          .... ...1 .... .... = Recursion desired: Do query recursively
>          .... .... 1... .... = Recursion available: Server can do
> recursive queries
>          .... .... .0.. .... = Z: reserved (0)
>          .... .... ..0. .... = Answer authenticated: Answer/authority
> portion was not authenticated by the server
>          .... .... ...0 .... = Non-authenticated data: Unacceptable
>          .... .... .... 0010 = Reply code: Server failure (2)
>      Questions: 1
>      Answer RRs: 0
>      Authority RRs: 0
>      Additional RRs: 1
>      Queries
>          25.188.223.216.wl.mailspike.net: type A, class IN
>              Name: 25.188.223.216.wl.mailspike.net
>              [Name Length: 31]
>              [Label Count: 7]
>              Type: A (Host Address) (1)
>              Class: IN (0x0001)
>      Additional records
>          <Root>: type OPT
>              Name: <Root>
>              Type: OPT (41)
>              UDP payload size: 4096
>              Higher bits in extended RCODE: 0x00
>              EDNS0 version: 0
>              Z: 0x0000
>                  0... .... .... .... = DO bit: Cannot handle DNSSEC security RRs
>                  .000 0000 0000 0000 = Reserved: 0x0000
>              Data length: 0
>      [Unsolicited: True]
> 


Re: DNS and RBL problems

Posted by Alex <my...@gmail.com>.
Hi,

On Sat, Sep 15, 2018 at 5:31 AM Benny Pedersen <me...@junc.eu> wrote:
>
> Pedro David Marco skrev den 2018-09-15 09:46:
> > Sorry, typo issue.. i meant 512 bytes
>
> and with EDNS0 its upto 4096
>
> but not all dns servers support it
>
> one could force tcp if wanted
>
> or drop buggy rbl zones

Thank you all so much for your help. The only thing between this
system and the Internet is the Optonline modem/router. I've even gone
without any local firewall rules to eliminate that possibility.

Just last night I implemented htb shaping to limit the outgoing SMTP
traffic rate to be sure it's not consuming the entire pipe, preventing
UDP traffic from being received. I don't think that's the problem,
though, as it happens during all times of the day.

> zone "hostkarma.junkemailfilter.com" { type forward; forward first;
> forwarders {}; };

I'm not sure this would help, as our nameservers aren't set up for
forwarding at all.

> Can you place a sniffer on LAN and WAN interfaces of your Firewall?

I've done this, and even posted packets for people to look at on the
bind-users list, and it was inconclusive. The packet involving the
"SERVFAIL" error doesn't provide any info as to why it failed. It
appears there was just never a response to the packet and the query
timed out.

> Just in case of unexpected throttling by someone/something in the middle... have you tried with a VPN (only for DNS traffic)?

I'll try that to see if somehow Optonline/Cablevision/Altice is
dropping my packets. However, it does also happen to our DIA ethernet
circuit, so I'm not hopeful.

Here's the packet trace of one of the failed packets, in case someone
has some ideas or was curious.

No.     Time           Source                Destination
Protocol Length Info
   9083 11.730327      127.0.0.1             127.0.0.1             DNS
     104    Standard query response 0xded6 Server failure A
25.188.223.216.wl.mailspike.net OPT

Frame 9083: 104 bytes on wire (832 bits), 104 bytes captured (832 bits)
    Encapsulation type: Linux cooked-mode capture (25)
    Arrival Time: Sep 13, 2018 15:46:36.633305000 EDT
    [Time shift for this packet: 0.000000000 seconds]
    Epoch Time: 1536867996.633305000 seconds
    [Time delta from previous captured frame: 0.000969000 seconds]
    [Time delta from previous displayed frame: 0.006367000 seconds]
    [Time since reference or first frame: 11.730327000 seconds]
    Frame Number: 9083
    Frame Length: 104 bytes (832 bits)
    Capture Length: 104 bytes (832 bits)
    [Frame is marked: False]
    [Frame is ignored: False]
    [Protocols in frame: sll:ethertype:ip:udp:dns]
    [Coloring Rule Name: UDP]
    [Coloring Rule String: udp]
Linux cooked capture
    Packet type: Unicast to us (0)
    Link-layer address type: 772
    Link-layer address length: 6
    Source: 00:00:00_00:00:00 (00:00:00:00:00:00)
    Unused: 6fc0
    Protocol: IPv4 (0x0800)
Internet Protocol Version 4, Src: 127.0.0.1, Dst: 127.0.0.1
    0100 .... = Version: 4
    .... 0101 = Header Length: 20 bytes (5)
    Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
        0000 00.. = Differentiated Services Codepoint: Default (0)
        .... ..00 = Explicit Congestion Notification: Not ECN-Capable
Transport (0)
    Total Length: 88
    Identification: 0x2dff (11775)
    Flags: 0x0000
        0... .... .... .... = Reserved bit: Not set
        .0.. .... .... .... = Don't fragment: Not set
        ..0. .... .... .... = More fragments: Not set
        ...0 0000 0000 0000 = Fragment offset: 0
    Time to live: 64
    Protocol: UDP (17)
    Header checksum: 0x4e94 [validation disabled]
    [Header checksum status: Unverified]
    Source: 127.0.0.1
    Destination: 127.0.0.1
User Datagram Protocol, Src Port: 53, Dst Port: 12304
    Source Port: 53
    Destination Port: 12304
    Length: 68
    Checksum: 0xfe57 [unverified]
    [Checksum Status: Unverified]
    [Stream index: 320]
Domain Name System (response)
    Transaction ID: 0xded6
    Flags: 0x8182 Standard query response, Server failure
        1... .... .... .... = Response: Message is a response
        .000 0... .... .... = Opcode: Standard query (0)
        .... .0.. .... .... = Authoritative: Server is not an
authority for domain
        .... ..0. .... .... = Truncated: Message is not truncated
        .... ...1 .... .... = Recursion desired: Do query recursively
        .... .... 1... .... = Recursion available: Server can do
recursive queries
        .... .... .0.. .... = Z: reserved (0)
        .... .... ..0. .... = Answer authenticated: Answer/authority
portion was not authenticated by the server
        .... .... ...0 .... = Non-authenticated data: Unacceptable
        .... .... .... 0010 = Reply code: Server failure (2)
    Questions: 1
    Answer RRs: 0
    Authority RRs: 0
    Additional RRs: 1
    Queries
        25.188.223.216.wl.mailspike.net: type A, class IN
            Name: 25.188.223.216.wl.mailspike.net
            [Name Length: 31]
            [Label Count: 7]
            Type: A (Host Address) (1)
            Class: IN (0x0001)
    Additional records
        <Root>: type OPT
            Name: <Root>
            Type: OPT (41)
            UDP payload size: 4096
            Higher bits in extended RCODE: 0x00
            EDNS0 version: 0
            Z: 0x0000
                0... .... .... .... = DO bit: Cannot handle DNSSEC security RRs
                .000 0000 0000 0000 = Reserved: 0x0000
            Data length: 0
    [Unsolicited: True]

Re: DNS and RBL problems

Posted by Benny Pedersen <me...@junc.eu>.
Pedro David Marco skrev den 2018-09-15 09:46:
> Sorry, typo issue.. i meant 512 bytes

and with EDNS0 its upto 4096

but not all dns servers support it

one could force tcp if wanted

or drop buggy rbl zones

Re: DNS and RBL problems

Posted by Pedro David Marco <pe...@yahoo.com>.
 Sorry, typo issue.. i meant 512 bytes
   
-----PedroD    

Re: DNS and RBL problems

Posted by Pedro David Marco <pe...@yahoo.com>.
 

>Maybe something in your setup is throttling UDP traffic.
>I've seen Zyxel DSL modems do this.
>Some new IDS in your firewall?

do not forget that DNS can use also TCP when the query is longer than 521 bytes...


-----PedroD  

Re: DNS and RBL problems

Posted by Axb <ax...@gmail.com>.
On 9/15/18 3:44 AM, Alex wrote:
> On Fri, Sep 14, 2018 at 4:24 PM Daniel J. Luke <dl...@geeklair.net> wrote:
>>
>> On Sep 14, 2018, at 3:26 PM, Kevin A. McGrail <km...@apache.org> wrote:
>>> On 9/14/2018 3:22 PM, Alex wrote:
>>>> I wish it were that easy. /etc/resolv.conf is set up to use 127.0.0.1,
>>>> which is bind configured as a my local caching resolver.
>>> Sinister issues like this are hard.  I'll try and escalate our plans for
>>> rsync access.
>>
>> Alex - have you looked at bad checksum counters on the host? (netstat -s) - I've seen strange issues before with broken network hardware (or bugs in switch/router code) caused changes to packets as they passed through the 'bad' device. The first hints were those counters increasing at the same time as the mysterious issue happening.
> 
> I don't see anything relating to bad checksums with netstat :-( I've
> also tried numerous ethtool config changes. I've also looked through
> hundreds of packets with tcpdump and wireshark.
> 
> This isn't a spamassassin message, but does anyone with a postfix
> system ever see similar "Name service error" messages such as the one
> below?
> 
> Sep 14 21:12:54 mail03 postfix/dnsblog[3713]: warning: dnsblog_query:
> lookup error for DNS query 239.242.238.54.ubl.unsubscore.com: Host or
> domain name not found. Name service error for
> name=239.242.238.54.ubl.unsubscore.com type=A: Host not found, try
> again
> 
> It appears to occur quite frequently, and on multiple unrelated
> systems. I'd love to find out what's causing it. The postfix people
> ascribed it to a remote server problem, but I can't believe virtually
> all RBLs, including spamhaus, would have such intermittent problems
> with *their* name servers.
> 

Maybe something in your setup is throttling UDP traffic.
I've seen Zyxel DSL modems do this.
Some new IDS in your firewall?

Re: DNS and RBL problems

Posted by Dominic Raferd <do...@timedicer.co.uk>.

On 15/09/2018 02:44, Alex wrote:
> On Fri, Sep 14, 2018 at 4:24 PM Daniel J. Luke <dl...@geeklair.net> wrote:
>> On Sep 14, 2018, at 3:26 PM, Kevin A. McGrail <km...@apache.org> wrote:
>>> On 9/14/2018 3:22 PM, Alex wrote:
>>>> I wish it were that easy. /etc/resolv.conf is set up to use 127.0.0.1,
>>>> which is bind configured as a my local caching resolver.
>>> Sinister issues like this are hard.  I'll try and escalate our plans for
>>> rsync access.
>> Alex - have you looked at bad checksum counters on the host? (netstat -s) - I've seen strange issues before with broken network hardware (or bugs in switch/router code) caused changes to packets as they passed through the 'bad' device. The first hints were those counters increasing at the same time as the mysterious issue happening.
> I don't see anything relating to bad checksums with netstat :-( I've
> also tried numerous ethtool config changes. I've also looked through
> hundreds of packets with tcpdump and wireshark.
>
> This isn't a spamassassin message, but does anyone with a postfix
> system ever see similar "Name service error" messages such as the one
> below?
>
> Sep 14 21:12:54 mail03 postfix/dnsblog[3713]: warning: dnsblog_query:
> lookup error for DNS query 239.242.238.54.ubl.unsubscore.com: Host or
> domain name not found. Name service error for
> name=239.242.238.54.ubl.unsubscore.com type=A: Host not found, try
> again
>
> It appears to occur quite frequently, and on multiple unrelated
> systems. I'd love to find out what's causing it. The postfix people
> ascribed it to a remote server problem, but I can't believe virtually
> all RBLs, including spamhaus, would have such intermittent problems
> with *their* name servers.

On one of our mailservers (but not others, which are at different 
locations with different isps) we had a problem with queries to rbls 
being blocked either by the rbls themselves or by one of the 
intermediate dns servers. So we set up local bind9 resolver; it uses 
forwarding for normal queries but for the rbls we set up special zones 
to prevent forwarding. Example:

zone "hostkarma.junkemailfilter.com" { type forward; forward first; 
forwarders {}; };

This solved nearly all our problems - we still see b.barracuda.org 
refusing some queries from this mailserver (despite this ip being 
registered with them). But not from our other mailservers, and not any 
other rbls.

Re: DNS and RBL problems

Posted by Alex <my...@gmail.com>.
On Fri, Sep 14, 2018 at 4:24 PM Daniel J. Luke <dl...@geeklair.net> wrote:
>
> On Sep 14, 2018, at 3:26 PM, Kevin A. McGrail <km...@apache.org> wrote:
> > On 9/14/2018 3:22 PM, Alex wrote:
> >> I wish it were that easy. /etc/resolv.conf is set up to use 127.0.0.1,
> >> which is bind configured as a my local caching resolver.
> > Sinister issues like this are hard.  I'll try and escalate our plans for
> > rsync access.
>
> Alex - have you looked at bad checksum counters on the host? (netstat -s) - I've seen strange issues before with broken network hardware (or bugs in switch/router code) caused changes to packets as they passed through the 'bad' device. The first hints were those counters increasing at the same time as the mysterious issue happening.

I don't see anything relating to bad checksums with netstat :-( I've
also tried numerous ethtool config changes. I've also looked through
hundreds of packets with tcpdump and wireshark.

This isn't a spamassassin message, but does anyone with a postfix
system ever see similar "Name service error" messages such as the one
below?

Sep 14 21:12:54 mail03 postfix/dnsblog[3713]: warning: dnsblog_query:
lookup error for DNS query 239.242.238.54.ubl.unsubscore.com: Host or
domain name not found. Name service error for
name=239.242.238.54.ubl.unsubscore.com type=A: Host not found, try
again

It appears to occur quite frequently, and on multiple unrelated
systems. I'd love to find out what's causing it. The postfix people
ascribed it to a remote server problem, but I can't believe virtually
all RBLs, including spamhaus, would have such intermittent problems
with *their* name servers.

Re: DNS and RBL problems

Posted by Pedro David Marco <pe...@yahoo.com>.
 
> On 9/14/2018 3:22 PM, Alex wrote:
>> I wish it were that easy. /etc/resolv.conf is set up to use 127.0.0.1,
>> which is bind configured as a my local caching resolver.
> Sinister issues like this are hard.  I'll try and escalate our plans for
> rsync access.

Alex, I also bet for a comms problem. On purpose or not....     
Can you place a sniffer on LAN and WAN interfaces of your Firewall?
Just in case of unexpected throttling by someone/something in the middle... have you tried with a VPN (only for DNS traffic)? 

-------PedroD
  

Re: DNS and RBL problems

Posted by "Daniel J. Luke" <dl...@geeklair.net>.
On Sep 14, 2018, at 3:26 PM, Kevin A. McGrail <km...@apache.org> wrote:
> On 9/14/2018 3:22 PM, Alex wrote:
>> I wish it were that easy. /etc/resolv.conf is set up to use 127.0.0.1,
>> which is bind configured as a my local caching resolver.
> Sinister issues like this are hard.  I'll try and escalate our plans for
> rsync access.

Alex - have you looked at bad checksum counters on the host? (netstat -s) - I've seen strange issues before with broken network hardware (or bugs in switch/router code) caused changes to packets as they passed through the 'bad' device. The first hints were those counters increasing at the same time as the mysterious issue happening.

-- 
Daniel J. Luke




Re: DNS and RBL problems

Posted by "Kevin A. McGrail" <km...@apache.org>.
On 9/14/2018 3:22 PM, Alex wrote:
> I wish it were that easy. /etc/resolv.conf is set up to use 127.0.0.1,
> which is bind configured as a my local caching resolver.
Sinister issues like this are hard.  I'll try and escalate our plans for
rsync access.

Re: DNS and RBL problems

Posted by Alex <my...@gmail.com>.
Hi,

On Fri, Sep 14, 2018 at 1:51 PM Rob McEwen <ro...@invaluement.com> wrote:
>
> On 9/14/2018 1:36 PM, Alex wrote:
> > Hi,
> >
> > For the past few weeks I've been having problems with queries to many
> > of the common RBLs, including barracuda, mailspike and unsubscore. My
> > logs are filled with "Name service error", SERVFAIL and lame-server
> > messages for RBLs I know to be valid.
> > <SNIP>
>
>
> Alex,
>
> Coincidentally, a recent new invaluement subscriber was initially having
> at least similar problems that didn't make sense. I was stumped. It made
> no sense that it wasn't working because everything looked correct. But
> then he figured out that the following bug was the cause, and fixing
> this bug enabled the queries to start working again:
>
> NOTICE: SpamAssassin installations affected by a bug, due to a change
> Net::DNS made in an earlier version, here is the bug for reference:
> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7223
>
> So you should definitely check to see if this is causing your problem?

I should have added that I'm aware of that Net::DNS bug, and I'm using
a version long-since fixed.

> I will also mention that if you are using a server such as 8.8.8.8, you MUST change.  I found
> that if you use 8.8.8.8, you cannot even pass a test for spamassassin builds.  They are doing some
> interesting things likely anti-abuse that just screw with things.

I wish it were that easy. /etc/resolv.conf is set up to use 127.0.0.1,
which is bind configured as a my local caching resolver.

It also fails for one out of every thousand queries of the PCCC RBL
for no clear reason.

14-Sep-2018 15:16:39.333 query-errors: info: client @0x7ff797169d70
68.195.193.45#34244 (hungryhowies.com.wild.pccc.com): query failed
(SERVFAIL) for hungryhowies.com.wild.pccc.com/IN/A at
../../../bin/named/query.c:8580

14-Sep-2018 15:16:39.333 query-errors: debug 2: fetch completed at
../../../lib/dns/resolver.c:3927 for hungryhowies.com.wild.pccc.com/A
in 30.000163: timed out/success
[domain:wild.pccc.com,referral:0,restart:7,qrysent:7,timeout:6,lame:0,quota:0,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0]

The check for hungryhowies.com succeeded at that time for a dozen
other RBLs, but later checks could fail for even one of those.

Re: DNS and RBL problems

Posted by "Kevin A. McGrail" <km...@apache.org>.
I will also mention that if you are using a server such as 8.8.8.8, you
MUST change.  I found that if you use 8.8.8.8, you cannot even pass a test
for spamassassin builds.  They are doing some interesting things likely
anti-abuse that just screw with things.

Regards,
KAM

--
Kevin A. McGrail
VP Fundraising, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171

On Fri, Sep 14, 2018 at 1:50 PM, Rob McEwen <ro...@invaluement.com> wrote:

> On 9/14/2018 1:36 PM, Alex wrote:
>
>> Hi,
>>
>> For the past few weeks I've been having problems with queries to many
>> of the common RBLs, including barracuda, mailspike and unsubscore. My
>> logs are filled with "Name service error", SERVFAIL and lame-server
>> messages for RBLs I know to be valid.
>> <SNIP>
>>
>
>
> Alex,
>
> Coincidentally, a recent new invaluement subscriber was initially having
> at least similar problems that didn't make sense. I was stumped. It made no
> sense that it wasn't working because everything looked correct. But then he
> figured out that the following bug was the cause, and fixing this bug
> enabled the queries to start working again:
>
> NOTICE: SpamAssassin installations affected by a bug, due to a change
> Net::DNS made in an earlier version, here is the bug for reference:
> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7223
>
> So you should definitely check to see if this is causing your problem?
>
> --
> Rob McEwen
> https://www.invaluement.com
>
>
>

Re: DNS and RBL problems

Posted by Rob McEwen <ro...@invaluement.com>.
On 9/14/2018 1:36 PM, Alex wrote:
> Hi,
>
> For the past few weeks I've been having problems with queries to many
> of the common RBLs, including barracuda, mailspike and unsubscore. My
> logs are filled with "Name service error", SERVFAIL and lame-server
> messages for RBLs I know to be valid.
> <SNIP>


Alex,

Coincidentally, a recent new invaluement subscriber was initially having 
at least similar problems that didn't make sense. I was stumped. It made 
no sense that it wasn't working because everything looked correct. But 
then he figured out that the following bug was the cause, and fixing 
this bug enabled the queries to start working again:

NOTICE: SpamAssassin installations affected by a bug, due to a change 
Net::DNS made in an earlier version, here is the bug for reference:
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7223

So you should definitely check to see if this is causing your problem?

-- 
Rob McEwen
https://www.invaluement.com