What Is a Firewall?

In the real world, a firewall is a thick, solid wall made from a fire-resistant material like brick or concrete. In case a fire breaks out, firewalls are supposed to contain the flames so that they can only damage a small part of the building. If a room is surrounded by firewalls, a fire in that room shouldn't be able to get out and a fire outside the room shouldn't be able to get in.

Computer firewalls work on a similar principle, but with computer programs instead of fire (stacking computers along your wall is no substitute for a smoke detector). Individual computers are like rooms, and a network, whether it's a small home network or the entire internet, is like the building. In case something bad happens to one computer, for example if it gets infected by a virus, the firewalls are supposed to contain the damage by preventing the virus from contacting other computers. At the same time, firewalls also help keep hackers from attacking your computer by blocking unwanted or suspicious data from coming in.

Packets and Connections

In order to know how firewalls work, you need to know something about how computers exchange information. using packets. Each packet contains a small amount of data, plus some auxiliary information such as where it comes from and where it's going; you can think of it like an envelope, with a destination address, a return address, and a letter (the data) inside.¹ Envelopes, of course, can only hold a certain amount of paper, so if you're mailing an exceptionally long letter you might split it up into groups of 10 pages and put each in a separate envelope; similarly packets are limited to a certain size, so if you want to send a chunk of data which is too large to fit in a single packet, it will need to be split up into several packets. The recipient will then have to put the pieces back together. Normally it doesn't matter that this happens, because computers take care of it automatically (see the next section). However, firewalls typically have access to the raw packets before they're put back together, and if you don't tell them otherwise, they may only let through the first packet of each group. Most firewalls for Windows take care of this automatically.

Network Interfaces

A network interface represents a physical device on which a computer can receive packets. Most computers have at least two of these: a special one called the loopback interface used to send messages from the computer to itself, plus an internet connection, which could be a wireless network card or a ethernet card (that's the one you plug a cable into). Laptops typically have both a wireless interface and an ethernet interface, and it's even possible to have multiple interfaces of any one type; for example, web servers which are networked together might have an ethernet card for communicating with the internet at large and another one just for communicating with each other. Each interface has its own IP address.² Some firewalls give you the ability to handle packets based on which interface they are coming in (or going out) through.

Common Internet Protocols

00 00 00 00 00 00 00 00  00 00 00 00 08 00 45 00
00 3c b9 60 40 00 40 06  83 59 7f 00 00 01 7f 00
00 01 ce 47 00 50 bb bd  4d 69 00 00 00 00 a0 02
80 18 c3 b2 00 00 02 04  40 0c 04 02 08 0a 00 12
f4 09 00 00 00 00 01 03  03 07

00 00 00 00 00 00	Gateway MAC address
00 00 00 00 00 00	Source MAC address
08 00	Ethernet protocol number (2048 for IP)
45	IP header length (69)
00	Differentiated services
00 3c	Data length (60)
b9 60	Packet ID number
40	IP flags
00	Fragmentation offset
40	Time to live (64 hops)
06	Internet protocol number (6 for TCP)
83 59	IP checksum
7f 00 00 01	Source IP address 127.0.0.1
7f 00 00 01	Destination IP address 127.0.0.1
ce 47	Source TCP port 52807
00 50	Destination TCP port 80
bb bd 4d 69	TCP sequence number
00 00 00 00	nothing
a0	TCP header length (160)
02	TCP flags (2 for SYN)
80 18	TCP window size
c3 b2	TCP checksum
00 00 02 04 40 0c 04 02 08 0a 00 12 f4 09 00 00 00 00 01 03 03 07	TCP options

Contents of a TCP/IPoE packet — that's TCP (blue) layered on IP (green) layered on Ethernet (red)

Imagine that you're writing a letter of complaint to recover your $87 million cut of Nigerian Prince Mumbo Ngumbo's bank account. You go out and hire a translator to draft your letter in some obscure dialect of some obscure Nigerian language so hopefully the prince will give you some sympathy points. Well, guess what, they speak English in Nigeria, so your letter falls on deaf — or at least very confused — ears. Or eyes. The recipient won't be able to understand it.

Packets work the same way: in order for them to be useful, the sender and receiver need to be using the same language, or protocol, to interpret the data in the packet. That's why the supplementary data in each packet includes an Internet Protocol number which tells the recipient how to interpret the packet's contents. TCP, which stands for transmission control protocol, is by far the most common protocol on the internet today. Any time you access a web page, every bit of data that passes between your computer and the web server is sent using TCP packets.

TCP is called TCP because it attempts to control the transmissions between one computer and another. By that I mean that it attempts to hide the fact that internet connections are pretty unreliable. Every time the transmitting computer sends a TCP data packet, it waits for an acknowledgement, a TCP ACK packet, before sending the next bit of data. If the packet vanishes into the crowd, there will be no acknowledgement and the sender will try that same packet again until it gets its verification that the packet has been received. TCP packets are also sequentially numbered in each connection, so that if they get reordered en route (which is possible if one packet takes a much longer path than another), the receipient will be able to put them back together in the right order. All this work is done inside the operating system, though, so to a computer program like your web browser, it looks like you can just send data over the internet and trust that it will get where it's supposed to go.

Besides TCP, there are two other protocols that are used regularly: UDP, or user datagram protocol, and ICMP, or internet control message protocol. Neither of these makes any attempt to emulate a two-way connection the way TCP does — when you're using UDP or ICMP, you just create a packet, send it out on to the network, and hope it gets there. Usually it does, but the protocol makes no guarantees. UDP packets is mostly used for DNS, the Domain Name System which allows your computer to convert domain names into numeric addresses, and ICMP packets are used for status notifications, like when a server wants to let you know that it got your packet but can't respond.

Chain-Based Filtering

Think about the process you probably go through when you get mail (even though you may not realize it): you throw out the junk mail, put the bills in your "things to do" pile, open anything that looks like a personal letter, and so on. Let's say it's The Future and your Roomba is now smart enough to graduate from entertaining you by bumping into things. It wouldn't be hard to tell it how to sort your mail because all that's involved is applying a list of rules to each letter, for example:

Condition	Action
Doesn't have your address on it	Put it back in the mailbox for forwarding
Addressed to "CURRENT RESIDENT"	Throw it out
Envelope says "YOU'VE WON NINE ZILLION DOLLARS!!!!!!"	Throw it out
From your phone company	Put it in the to-do pile
(everything else)	Open it

This is basically how a firewall works: it has a list of rules, each consisting of some set of conditions along with an action. Every time a packet passes through the computer, the firewall goes through the list, in order, and checks to see if the packet satisfies all the conditions for a particular rule. If it does, the firewall takes the action associated with that rule. Some firewalls only use a few conditions; for example, ZoneAlarm chooses to accept or deny packets based on only two conditions: which program they're coming from, and whether they're going to a local, trusted network or to the internet at large. In contrast, the Linux firewall, IPTables, offers literally dozens of criteria for choosing which packets to let through and which ones to block.

Network connection diagram — An overview of the packet filtering in a kernel

Securing Your System

It can be pretty difficult to effectively protect a computer: hackers only need to find one vulnerability, but you need to find (and fix) them all. But you can get a good start by following a few rules:

Use Whitelisting, Not Blacklisting

One of the most treasured principles of security is that whenever you're setting up some system to accept or reject things (like packets), you should always assume that they can't be trusted by default. It's kind of like a "guilty until proven innocent" principle.³ This is based on the assumption, which is essentially always correct, that you could never go through the entire list of possible packets and identify every one which is or even could be used in an attack on your computer. If you did make a set of rules to block bad packets but let everything else through, which is called blacklisting, there's a good chance that some hacker will stumble upon some kind of packet he or she can use which you haven't blocked, and then your firewall is useless. Remember, there are a lot more of them than there are of you!

The preferred method is to block everything by default and create rules to allow only the known good packets through, which is called whitelisting. In a firewall, this involves setting the default behavior so that packets which don't match any of your rules will be dropped.

Don't Trust Anything From The Internet

It's easy to send a letter with somebody else's return address, since nobody will check on it (except perhaps the mail carrier who picks it up from your box, but even then you could just drop it off at a public mailbox). Faking the recipient's address is harder, since if you write the wrong address on an envelope, the post office won't deliver it to the right place, but you can work around that by walking over to the mailbox of the person who you want to receive the envelope and just dropping it off. Imagine how confusing it could be if that person opened mail without checking to see if it was actually addressed to them!

In the same way, the source and destination of a packet — and in fact, pretty much any of the packet's contents — can be faked, so you shouldn't trust that any of the information is accurate. In a firewall, you can add rules to check even the things that you might think should obviously be true, such as the destination address of the packet being the address of the computer you're protecting. If hackers manage to send in packets with fake information, you'll have the rules in place to block them.

Silent Rejection

When you get a letter you don't want, you can put it back in the mail system with a "Return to Sender" notice, or you can just throw it out. Similarly, when a firewall decides not to let a packet pass, it can either send back a message saying that the packet was blocked, or it can just discard it without any notification. But even an explicit rejection notice can be useful information to a hacker. Hackers will sometimes send out just a few packets to each of many different IP addresses and wait to see which ones they get a response from; then they know which IP addresses to take a second look at. If your firewall sends explicit rejection messages, that lets the hackers know that they've successfully contacted a real computer, and you can bet that they'll be taking a closer look at your computer.

The easy fix, of course, is to configure your firewall to drop any unrecognized packets without sending anything back. With that policy in place, a hacker has no way to even tell that your computer exists.

IPTables: The Linux Firewall Interface

It's easy for anyone who gets their hands on a packet to look at the data it contains, which is not the case with an envelope — at the very least, you'll usually know if someone opens your mail. ↩
This means that one computer can easily have multiple IP addresses, if it has multiple network interfaces. ↩
The reason this doesn't work in the criminal justice system is that there's a high cost to putting an innocent person in jail, but if a few good packets get blocked, it's no big deal. ↩