
This is a part of the Server Basics -series where I explain basic server concepts, tools, services and other things that might be of interest to or be needed by people across the galaxy.
This was meant to be published after all the more basic stuff but I realized I can't explain practically anything if the reader doesn't know how TCP/IP works.
TCP/IP
TCP/IP is something you see everywhere - it's practically the backbone of internet communications. The acronym stands for Transmission Control Protocol / Internet Protocol and is actually referring to two different protocols which are both used together to make the internet thingie work. In this post I'll mostly talk about the IP part and how that affects the average geek's life - knowing how TCP works isn't very crucial for basic server operations.
Basically:
- TCP is the protocol that conveys the actual message (any sort of information) and makes sure it gets delivered.
- IP is the protocol that makes sure the message finds its way to the destination.
You can think of TCP as a messenger and IP as the navigator - they both need each other to achieve anything practical.
IP addresses
The internet works by routing requests and responses between different devices using IP addresses. Your usual IP address consists of four numerical values between 0 - 255 separated by dots (e.g. 62.183.148.99) which means there are 256^4 -> ~4,3 billion different addresses. The actual number of unique addresses is lower since a portion of them are considered private addresses - those are not unique to the world but used in local networks and other special cases.
- 192.168.0.0 - 192.168.255.255 is the most common private address block containing roughly 65 thousand addresses. It's very common for consumer routers to have 192.168.1.1 as their default address.
- 172.16.0.0 - 172.31.255.255 in another private address block containing roughly one million addresses.
- 10.0.0.0 - 10.255.255.255 is the third and biggest private address block, containing almost 17 million addresses.
Other reserved addresses contain link-local addresses and loopback addresses.
- 127.0.0.0 - 127.255.255.255 are loopback addresses - hosts (=devices) use these addresses to talk to themselves. Most common you'll be seeing is 127.0.0.1 - the default localhost address in linux systems.
- 169.254.1.0 - 169.254.254.255 are link-local addresses i.e. addresses hosts allocate to themselves if they cannot obtain IP address from the network by DHCP (see below).
Any of the addresses above are not globally unique - they're used in millions of private networks across the world. There are also other less common reserved addresses but they're usually not much of a concern.
Techically the four parts in an IP address are called octets, since they're actually segments of 8-bit binary integers, and 255 is the largest value you can express with 8 bits.
Subnets, masks and CIDR
The infrastructure of internet is not centrally governed to start with - it consists of a vast number of different sized networks which are connected to each other. The administrators of these Wide Area Networks (WAN's) are given control of a portion of the global IP address space which they can route on their own. Please note that WAN is used in broad context here as there are separate names for networks of different sizes (WAN being the largest).
Historically the IPv4 address space was divided into five classes: A, B, C, D & E - although only classes from A to C are actually meaningful. Classes can be easily explained with examples from the 10.0.0.0 - 10.255.255.255 private address space:
- A class network is the whole address space, e.g. 10.0.0.0 - 10.255.255.255 = 16,7M addresses
- B class network is a 255th of a full A class network, e.g. 10.22.0.0 - 10.22.255.255 = 65k addresses
- C class network is 255th of a B class network, e.g. 10.22.33.0 - 10.22.33.255 = 256 addresses
Quite obviously, these are called classful networks. But since this kind of segmenting isn't very flexible, wise old nerds have also come up with a way to divide the address space into any number of subnets. This is called Classless Inter-Domain Routing (CIDR).
In CIDR notation addresses contain a trailing number separated by a slash (e.g. /16). This trailing number (from 1 to 32 - 0 is a special case) expresses the size of the network but does it inversely. This is because protocol-wise the number represents the bits that are already used from the network address space. IPv4 is a 32-bit protocol and therefore CIDR notation number can't be higher than 32. If an IP address has /32 CIDR postfix, it means the network is no larger than the host itself - e.g. address 10.20.30.40/32 means nothing more than device at 10.20.30.40.
But every time the CIDR postfix gets smaller, the size of the network it points to, doubles because that's how binary works. For example:
- 10.22.33.0/31 points to address range 10.22.33.0 - 1 --> 2 addresses
- 10.22.33.0/30 points to address range 10.22.33.0 - 3 --> 4 addresses
- 10.22.33.0/29 points to address range 10.22.33.0 - 7 --> 8 addresses
- 10.22.33.0/28 points to address range 10.22.33.0 - 15 --> 16 addresses
- 10.22.33.0/27 points to address range 10.22.33.0 - 31 --> 32 addresses
- 10.22.33.0/26 points to address range 10.22.33.0 - 63 --> 64 addresses
- 10.22.33.0/25 points to address range 10.22.33.0 - 127 --> 128 addresses
The /24 CIDR postfix is good to remember since it points to 256 addresses - a whole class C subnet which is commonly used in home networking.
There's also a concept called subnet mask, which is practically just another way to represent the network size. Most common subnet mask is 255.255.255.0 which represents a /24 CIDR network but in a different way. The three 255-parts tell that these bits are already used, but the zero in last segment tells that there are 256 (0-255) unused addresses. Numerical subnet masks are often used in consumer devices, whereas in professional grade hardware you can also use CIDR notation.
Routers, address distribution and DHCP
IP addresses are managed by devices called routers which (as expected) route the TCP packets carrying raw data through the web. As routers direct the traffic by using IP addresses, they're not that much different from any other devices connected to a certain network. As with (m)any other case(s) in service networking, routers listen to a certain address (sometimes an address range) and therefore hear the traffic presented to them. Routers also have an address range where they expect the client devices to reside.
But if you as a client want to communicate with a router, you'll either need to know in what address that router resides or the router must advertise its presence. For this purpose there's a protocol named Dynamic Host Configuration Protocol (DHCP) which enables hosts to distribute and clients to obtain IP addresses from routers. Basically the host devices define one or several ranges of IP's they're providing to clients and any client connected to them can request an address for communication.
Global and private networks & NAT
Because of the limited number of IP addresses (four billion is not very much in sense of the internet) wise old nerds have come up with the technique called Network Address Translation (NAT). Address translation is a method to connect private networks to public networks using only a single public (=globally unique) IP address.
As I stated above, there are 18+ million IP addresses reserved for private use. These private IP's are addresses anyone can use to build their own little (or big) network and connect it to internet - provided they have at least one globally unique address to route the traffic through.
A typical home network consists of a modem-router-switch-wireless access point -thingie (i.e. a multipurpose device which does practically everything a typical consumer would need) which is connected to the internet and has one global unique IP address. This consumer router does the address translation mentioned above and has a simple DHCP server to provide private addresses to the devices in the local private network.
Simple network structure
In a typical (home) network scenario the router-thingie receives one globally unique IP address from the internet and uses it to communicate with the ISP's (Internet Service Provider) infrastructure. The router then creates a small private network to allocate IP addresses to client devices connecting to it - very usually the whole last segment xxx.xxx.xxx.0-255 (.0/24 in CIDR notation).
The router reserves three of the addresses for itself:
- the first address (.0) to distinguish the network itself
- the second address (.1) for itself - it's the one that client devices will see
- the last address (.255) as broadcast address
One of the most common default setups is the router using the address 192.168.1.1 for itself and allocating IP's for clients from 192.168.1.2 to 192.168.1.254 - meaning you can technically have up to 253 different devices on your home network. When you connect your computer / phone / console / fridge / pet / whatever to your home network, the router will allocate it an own IP address from the private address range it's configured to use - using the same DHCP protocol stated earlier.
Default gateway, routing process & address translation
Whenever a client (or router) tries to communicate with another, it can either look for the target from its own list of known addresses, or it can query the address from network via default gateway. Default gateway is literally the device's gateway to other networks eg. the internet - if the client doesn't know where the target address is, it can pass the request to next device in the network. If the target exists in the network, the route to its location will eventually be returned to the client. If the network is configured by DHCP, the hosting router works as the default gateway for all its clients.
Devices can have only one default gateway since it's the last option where to query the target host - that's why it's called default. Connections to multiple networks are possible (and common) using static routes, but that topic is beyond the scope of this article.
If the communication request originates from within a private network, the router that's connected to public internet (in most home cases the only one) translates the IP address from private to public before it's passed on. This is needed for the reply to actually find its way back - there are countless private networks which all share the ~18 million private address space. If a request originating from a private network would have its private address as "reply-to" -address, the devices in public networks would have no idea where it actually is. This is why routers doing the address translation store the private address where request originally comes from and send the query using with their own (public) IP address. If the request is replied, address is translated back to private and sent to originating device in private network.
The exact mechanism how the router recognizes what incoming packets should be forwarded to internal network is part of TCP protocol, which is a subject for its own article.
Examples in a small home network
Now that we know the fundamentals of networking, let's see how routing process actually works. Let's assume we have network with two client devices and one router which connects them. The client A has address 192.168.1.10, client B has address 192.168.1.20 and the router C has the presumable 192.168.1.1 address. In our example the clients know only their own address and the router address. In this example the router act as the default gateway for both clients (as this is a common setup) and is connected to internet.
Case 1
Client A (192.168.1.10) wants to communicate with client B (192.168.1.20)
- As A itself doesn't know where B is located, it sends a query to its default gateway (in this case the router C)
- The router knows where B is located (or more specifically it knows the exact route to it) and therefore routes the request towards B
- B replies to the request and a TCP connection is established
Case 2
Client A (192.168.1.10) wants to communicate with internet server D (e.g. 82.203.164.17)
- Again, A doesn't know any other hosts except for its gateway so it passes the request to C
- As D is located outside the local (192.168.1.*) network, the router has no idea where that specific address is located. Therefore C passes the request to its default gateway, which is (usually) an industrial router operated by the internet service provider.
- The request is passed through internet from router to another until the queried host is found. An error is returned if the host is not found.
What about IPv6?
All the examples in this article have been using IPv4 - Internet Protocol version 4. IPv6 - meaning Internet Protocol version 6 is the next upcoming version of Internet Protocol. IPv6 was invented back in 20th century to combat the fact that 4,3 billion IP addresses are not enough to effectively sustain the internet ecosystem.
IPv6 addresses consist of eight (8) segments which are made of 4-digit hexadecimal numbers (meaning range of 0-65535). This means there are 340 sextillion/undecillion (340 x 10 ^ 36 i.e. 340 followed by 36 zeros) possible addresses. IPv6 is not interoperable with IPv4 which has slowed its progress but IPv6 is already used in conjuntion with IPv4 - for example many mobile networks assign an IPv6 address to clients and prioritize traffic over that.
Due to its (still) special status, IPv6 is also subject for a separate article later on.
TCP packets go brrrrrrt -> internet happens.
- Address 127.0.0.1 means the device itself
- Addresses 192.168.*.* / 172.16.*.* and 10.*.*.* mean private networks
- Devices can only have one default gateway
- Clients can be (and usually are) autoconfigured using DHCP