How user reaches intended URL?

.

Pratham Mittal
8 min readJul 6, 2023

1. Type https://www.example.co.uk into your browser.

2. The browser will reach and check all possible caches/servers

The browser will reach and check all possible caches/servers for a DNS record to find the corresponding IP address of https://www.example.co.uk as machines are good with number convention just like humans are good with naming convention.

DNS(Domain Name System) is a database that maintains the mapping of Domain name with their linked IP address. The IP address belongs to the machine that hosts the server of the website we are requesting to access.

For example, www.google.com has an IP address of 142.250.206.100. So we can reach www.google.com directly by typing https://142.250.206.100 on your browser.There can be a case you might encounter (Direct IP access not allowed) as due to limited IPv4 addresses, server hosts more than 1 website on single IP.

It is easier to remember the name of the website using a URL and let DNS do the work for us by mapping it to the correct IP. To start the process, browser firstly checks all possible caches at different levels. Caching is very necessary for faster data fetching thus preventing any type of latency.

● Browser Cache: It firstly checks the browser cache as browser caches DNS records for some time for websites you have previously visited and most visited.

● OS Cache: Browser then checks the OS cache by making a system call (hosts file in Windows/Linux/Mac ) to your underlying computer OS to fetch the record since the OS also maintains a cache of DNS records.

● Router Cache: Browser then checks the router cache as router itself also maintains it’s own cache.

● ISP Cache: Browser at last checks the ISP (Internet Service Provider) like Airtel/Idea/Reliance cache as every ISP has its’ own DNS server, which includes a cache of DNS records.

3. Recursive search starts

If there is no luck from cache, ISP’s DNS server initiates a DNS query to find the IP address of the server that hosts www.example.co.uk.

ISP’s DNS will now follow a recursive search (means the search will repeatedly continue from one DNS server to another DNS server) until it either finds the valid IP address or returns an error response saying it was unable to find it due to which ISP DNS can also be called Recursive DNS. ISP DNS will start asking other DNS servers on the internet for an answer.

Let’s first see the domain architecture and will then understand how ISP DNS starts its further investigation.

URLs may contains a third-level domain, a second-level domain, and a top-level domain. Each of these levels contains their own name server, which is queried during the DNS lookup process.

For www.example.co.uk, first, the Recursive DNS will contact the root name server’s DNS (There are only 13 DNS root name servers in the world). The root name server will start reading URL from right side and redirect Recursive DNS to the .uk domain name server or else will provide IP address if Root name server has the same.

.uk name server will redirect Recursive DNS to the .co name server or else will provide IP address if .uk name server DNS has the same. .co name server will then redirect Recursive DNS to the example.co name server or else will provide IP address if .co name server DNS has the same. The example.co name server will find the matching IP address for www.example.co.uk in its’ DNS records and return it as authoritative answer to your Recursive DNS, which will send it back to your browser and caches in its cache as well for specific time as well.

Let’s see this from an example.

I tried nslookup (Name Server lookup to query the Domain Name System (DNS) to retrieve information about domain names, IP addresses, and DNS records.) This by-default used my local server (192.168.1.1) to get the IP address for www.google.com and same gave me the IP address as well but with Non-authoritative answer (which means my local server doesn’t take authority of correct answer)

I tried nslookup again but this time i directed nslookup to use google’s authoritative server (ns1.google.com) to fetch IP address of www.google.com and same gave me the IP address as well but this time with no Non-authoritative answer warning confirming that this IP address is correct and it’s authority of the answer as well.

But why this is important to get authoritative answer as my local machine is also giving me the response?

This is because of DNS poisoning attack as this might happen an attacker might have poisoned the DNS entries in browser/OS/Router/ISP cache. Yes, you are right what if Google’s name server is also poisoned by an attacker? yes this is also possible but very difficult to achieve but not impossible :)

DNS Poisoning

DNS poisoning/ DNS spoofing/DNS cache poisoning, is a malicious attack that manipulates the DNS resolution process to redirect users to malicious websites. It involves altering the DNS records stored in the DNS cache or compromising the DNS server’s responses to provide incorrect IP address mappings for domain names. As you can see in above two images IP addresses of www.google.com are different, there can be a chance the non-authoritative answer’s IP is poisoned IP and if browser trusts that poisoned IP, browser might redirect victim to attacker’s controlled website.

Phewww….Coming back to from where we started. Now we finally have the IP address of www.example.co.uk.

NOTE: These requests of getting IP addresses are sent using small data packets that contain information such as the content of the request and the IP address of the Recursive DNS. These packets travel through multiple networks between client and server before it reaches the correct DNS server. This process involves usage of routing tables to figure out which way is the fastest possible way for the packet to reach its destination.

4. The browser initiates a TCP connection with the server.

Once the browser receives the correct IP address, it needs to build TCP connection with the server first before sending/receiving any data packets. There are several different internet protocols that can be used to build connection, but TCP is the most common protocol used for many types of HTTP requests. (I hope you know difference b/w TCP & UDP)

This connection is established using a process called the TCP/IP 3-way handshake where the client and the server exchange SYN (synchronise) and ACK(acknowledge) messages to establish a connection.

1. The client machine sends a SYN packet to the server over the internet, asking if it is open for new connections.

2. If the server has open ports that can accept and initiate new connections, it’ll respond with an ACK to the client’s SYN packet using a SYN/ACK packet.

3. The client will receive the SYN/ACK packet from the server and will acknowledge it by sending an ACK packet.

Then a TCP connection is established for data transmission!

Ohhh wait…It’s https not http. This requires one additional SSL/TLS handshake before start exchanging data packets. The same is discussed in very much detail in my different writeup (kindly refer https://pratham-08.medium.com/ssl-tls-handshake-in-detail-f02c2011861d forthe same)

5. The browser sends an HTTP request to the webserver.

Once the TCP connection and SSL/TLS handshake is established, it is time to start transferring data! The browser will send a GET/POST request asking for www.example.co.uk web page. If request required entering credentials or submitting a form, this could be a POST request. This request will also contain additional information in form of Request headers such as browser identification (User-Agent header), types of requests that it will accept (Accept header), Host, Referer, Cookie, Connection headers (asking it to keep the TCP connection alive for additional requests).

Sample GET request (Headers are highlighted):

(To understand requests more in depth, Use tools such as BurpSuite)

6. The server handles the request and sends back a response.

The server receives the request from the browser and passes it to a request handler to read and generate a response. The request handler is a program that reads the request, its’ headers, and cookies to check what is being requested and also update/cache the information on the server if needed. Then it will assemble a response in a particular format (JSON, XML, HTML). Response can be cached on server on proxies if used to prevent latencies and increase scalability.

7. The server sends out an HTTP response.

The server response includes the response body as well, which contains the actual data to be sent back to the client as well as the status code, compression type (Content-Encoding), how to cache the page (Cache-Control), any cookies to set etc.

The first line in response shows a status code which tells us the status of the response. There are five types of statuses detailed using a numerical code.

So, if you encountered an error, you can take a look at the HTTP response to check what type of status code you have received.

8. The browser displays the HTML content (for HTML responses, which is the most common).

The browser on receiving the response from sever renders the response, displaying the content to the user. This may involve rendering HTML, executing JavaScript, displaying images, or handling other types of media based on the content received. These static files are cached by the browser, so it doesn’t have to fetch them again the next time you visit the page.

I know this is a very long read but i assure, this is your one step read to cater all doubts.

Do support this and feel free to comment in case you see any issues.

Credit: https://medium.com/@maneesa?source=post_page-----bb0aa2449c1a--------------------------------

--

--

Pratham Mittal

Ethical hacker || Security Engineer || Amazon, Ex - Razorpay, MakeMyTrip, Synopsys