How does DNS work?
How does DNS work? Magic! This is a suitable answer for people who do not want or need to know the relationship between the hostname example.org and (when this was written) 93.184.216.34 or 2606:2800:220:1:248:1893:25c8:1946. Those such people should stop reading now.
There are various commands that do more or less the same thing, with more or less levels of detail.
DNS is not the only method of hostname resolution; the last command also checks the contents of the /etc/hosts file.
This is being done on OpenBSD; other systems may vary in where they hide the hosts file, the exact commands, etc. Other methods of name resolution are possible, and there may be caching with old stale values, etc.
So anyways there are various commands or programs that perform hostname resolution. These commands typically use a system function such as getaddrinfo(3) that given a name returns IP address(es) (or an error). The command may also perform its own DNS resolution. This is typical in commands used to debug DNS requests, or in programs that bypass the usual system hostname resolution—DNS over HTTPS is one such option. dig(1) is a typical tool for advanced DNS debugging.
Most of the command have used the system resolver, while others have used a specific DNS server such as 8.8.8.8 or 1.1.1.1 among other such public choices. Memorizing or writing down a few of these for local reference may help, especially if the local DNS resolver is ever broken.
192.168.0.1 here came from the local DHCP server that is typical for a home wifi access point. The system of course could be instructed to use some other DNS server in addition to or as a replacement for the one provided by DHCP or some other means.
So how does DNS work? The IP address of a server is necessary. The client constructs a packet and sends it to the server, and if all goes well the server will reply. The request can happen over UDP or TCP, and there may be multiple requests, especially if a UDP packet is dropped or the server does not reply. TCP is typically only used for larger requests that do not fit in a UDP request. A request can be observed with tcpdump(8) or a tool such as wireshark.
The client at 192.168.0.5:3103 sent a request to 1.1.1.1.53; 1.1.1.1.53 responded to 192.168.0.5.3103 with the answer. What 1.1.1.1 did is slightly more complicated. In this case it probably has already cached the result and returned that. DNS records have a time-to-live (TTL) associated with them, which for example.org was 85531 (see the dig(1) output, above) or 23 hours, 45 minutes, and 31 seconds. With an empty cache (or when the record has expired) 1.1.1.1 will perform a recursive process, though there may be caching of the records involved. DNS is a distributed database, and a given DNS server will have no idea where the record for example.org can be found. DNS servers have a list of root DNS servers; this list should be updated periodically. For an unknown record the components of the hostname are split, so example.org becomes "example" and "org" and the DNS server at 1.1.1.1 first asks the root DNS servers: who is responsible for "org"? There is a nameserver or NS record involved.
There are also NS records for the root DNS servers (but a DNS server will need to know the IP addresses for these by some other means).
The root servers have for "org" (and various other top level names) glue records, which notably contains what IP address(es) to use for "org" (or the other TLD, such as "com" or "edu"). Without knowing the IP address for, say, "b0.org.afilias-nst.org" how would you ask the "org" DNS servers what the IP address for that host is? The glue record provides that. So anyways 1.1.1.1 makes a request to one or more root servers "who owns org?" and the root servers return a bunch of IP addresses. Then, 1.1.1.1 can ask one or more of the "org" IP addresses "who owns example.org?" to which the "org" servers respond, we do!
The 1.1.1.1 server is not authoritative for example.org; the 199.19.54.1 server is authoritative. Or claims to be; these requests may not use any security, so the other end could be lying to us. Usually though it all works out. The lookup process is recursive; the "example.org" folks could hypothetically delegate "foo.example.org" to some other name server, which in turn could delegate "bar.foo.example.org" to yet another name server, etc. This means a request for "baz.bar.foo.example.org" might go through "." for the root servers, "org", "example.org", "foo.example.org", and finally "bar.example.org" before finding an authoritative answer. Hence the time-to-live (TTL) caching on lookups so that this process need not be repeated for each and every request.
There can also be a negative cache, so if you lookup "doesnotexist.example.org" the DNS server will cache that negative result for some amount of time. This makes for fun debugging when a client requests "foo.example.org", the DNS server admin creates "foo.example.org", but then "foo.example.org" will not actually exist until all the negative cache records clear. In addition to the new record propagating out to all the DNS servers, which may take some amount of time.
A DNS request can go wrong in all sorts of different ways.
- there may be problems unrelated to DNS, such as a firewall or incorrect records in /etc/hosts
- the DNS server(s) might be down
- the DNS server(s) may be returning incorrect or old records
- the DNS server(s) may have a poisoned cache and are returning very incorrect records
- the glue records could be broken, in which case "org" may not know how to get to "example.org"
- random caching issues, positive or negative
- the DNS server(s) may be injecting wildcard records so that hostname typos return some record instead of the expected error
- there might be random software bugs that cause incorrect results or caching issues
- packets could be dropped or delayed which may result in lookup errors, especially if timeouts are set too low. too high a timeout may waste time waiting for the inevitable error; there is probably a Goldilocks zone here
- many other things I haven't thought to put here or have forgotten about or don't know about
Therefore, to debug DNS you'll probably need to have a good understanding of the process, software, and tools involved, and know how to use things like tcpdump(8) or dig(1) to see what is going on. DNS-over-HTTPS or DNSSEC may additionally complicate matters. Running your own DNS server is a good way to learn all this. This can be done ad-hoc on a local system, or maybe you can get a DNS provider to create glue records over to your own nameserver(s) for some subdomain.