44net-request@hamradio.ucsd.edu wrote:
Subject: Re: [44net] Performance of DNS From: Jeroen Massar jeroen@massar.ch Date: 08/06/2014 11:22 AM
To: AMPRNet working group 44net@hamradio.ucsd.edu
Jeroen,
In general: please read the entire mail before starting to comment on individual paragraphs, so you don't need to ask questions that are answered a few paragraphs down the same mail.
The system is running Debian Wheezy, all uptodate, with bind9 version 9.8.4 plus Debian patches. It is not available to the outside no so security worries.
What I see is that it does not cache ampr.org addresses very long, but that does not surprise me because the default TTL in the zone is only one hour. Of course everything would perform better when the TTL was the more usual 24hours, but undoubtedly there was a good reason to set this TTL. (lately it was useful for me as I changed the external address of the machine and the update was propagated quickly in DNS, but in general I would think the zone is very static)
What I am surprised about is that the measured relative performance of the 7 alternative DNS servers is apparently not kept by bind long enough to be useful. The TTL at that level is 24 hours but I think I have often seen that when doing the same lookups within 24 hours I see lookup delays again. The statistics command you gave does not provide that info, I wonder if there is some bind command to query its measured timers and preferred servers.
We have several timeservers on net-44 addresses and I do "ntpq -p -c rv -c mrulist" a couple of times a day now that we are testing and deploying. It was slow every time, of course the cached lookups are gone because the previous try was more than an hour ago, but apparently the DNS preference info was gone too and queries were again sent to slow (for me) servers.
After my experimental change with the hardwired forwarders everything works much better. I'm not sure I want to keep it, but it certainly indicates that there *is* a way in which bind could handle it more efficiently. Maybe I am missing some setting, I have experimented with setting a forwarder (and forward first) at top level as well. Probably I should turn off the DNSSEC that has been enabled by default by bind and Debian, that appears to cause a lot of extra overhead too.
Rob
On 2014-08-06 21:31, Rob Janssen wrote: [..]
In general: please read the entire mail before starting to comment on individual paragraphs, so you don't need to ask questions that are answered a few paragraphs down the same mail.
(Ehmm, which exact questions that I supposedly made where answered "a few paragraphs down"!? ;)
The system is running Debian Wheezy, all uptodate, with bind9 version 9.8.4 plus Debian patches.
Do you mean "I am running the latest Debian stable/testing/unstable" or do you mean "I took the patches from Debian and compiled them together"?
It is not available to the outside no so security worries.
What exactly do you mean with that? How did you make it "not available"? Which "outside" has no access to it? ampr.org NSs are on the wide Internet, hence something has to be able to send it packets.
Did you maybe simply mean that it is non-recursive for non-local clients?
If it is really "not available to the outside" then that might explain your resolution issues.
Are you allowing both UDP and TCP port 53 replies for instance?
Note that as the replies have to come in, if there is a bad path
What I see is that it does not cache ampr.org addresses very long, but that does not surprise me because the default TTL in the zone is only one hour. Of course everything would perform better when the TTL was the more usual 24hours, but undoubtedly there was a good reason to set this TTL.
Actually, lots of zones have short TTLs on labels so that those hosts can be changed quickly. Typically one does set somewhat longer TTLs on the NS hosts though.
(lately it was useful for me as I changed the external address of the machine and the update was propagated quickly in DNS, but in general I would think the zone is very static)
It is very common, do check mass-hosted services like Google, Facebook, Akamai etc. All have nice low TTls as they want to see your queries and be able to change them a lot.
What I am surprised about is that the measured relative performance of the 7 alternative DNS servers is apparently not kept by bind long enough to be useful. The TTL at that level is 24 hours but I think I have often seen that when doing the same lookups within 24 hours I see lookup delays again. The statistics command you gave does not provide that info, I wonder if there is some bind command to query its measured timers and preferred servers.
You can always use: "rndc dumpdb" to get the current database.
or just query and check the TTL that is left: dig @<ns> <hostname>
We have several timeservers on net-44 addresses and I do "ntpq -p -c rv -c mrulist" a couple of times a day now that we are testing and deploying. It was slow every time, of course the cached lookups are gone because the previous try was more than an hour ago, but apparently the DNS preference info was gone too and queries were again sent to slow (for me) servers.
When running such a query that you expect that is slow, run a tcpdump in the background or anther shell, then you can see what is being queried and takes so long. Wireshark should visualize this easily in conversation view.
After my experimental change with the hardwired forwarders everything works much better. I'm not sure I want to keep it, but it certainly indicates that there *is* a way in which bind could handle it more efficiently.
You just hard-coded them, thus ignoring any kind of TTL. That cannot be done internet-scale, then you could as well just go back to /etc/hosts.
Note that you are also avoiding the actual lookup of the NS record, lots of baby steps.
[..] Probably I should turn off the DNSSEC that has been enabled by default by bind and Debian, that appears to cause a lot of extra overhead too.
As asked above, did you filter out TCP for DNS?
EDNS0 quite needs it and DNSSEC needs EDNS0 due to big responses.
tcpdump/wireshark as mentioned above is your best way to debug...
Greets, Jeroen
Beyond filtering, are you looking at your named logs too? There are a ton of old / misconfigured DNS servers out there that don't support EDNS:
success resolving 'blns71.spamcop.net/A' (in 'spamcop.net'?) after disabling EDNS success resolving 'bcwww.enet.cu/A' (in 'enet.cu'?) after reducing the advertised EDNS UDP packet size to 512 octets
success resolving 'hispructs.com/A' (in 'hispructs.com'?) after reducing the advertised EDNS UDP packet size to 512 octets
You can test for this via this helpful write up: https://www.dnsoarc.net/oarc/services/replysizetest/
--David