MPLS in the kernel
Linux 4.3 was released last month, and one of the long-awaited features was MPLS support in the kernel. There is still a the odd bug to iron out, but you can get a working MPLS testbed with the current kernel source (plus a single patch to fix a showstopper).
Building the kernel
- Download the source of kernel 4.3 from here: https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.3.tar.xz
- Unpack the tarball (tar -xf linux-4.3.tar.xz)
- Enter the newly-created linux-4.3 directory, run make menuconfig, and enable lwtunnel support, mpls-iptunnel support, mpls-gso support, and mpls-router support.
- Apply the patch from http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/diff/?id=fe82b3300ec9c0dc4ba871f9a58b265aadf4e186 (this fixes a problem with sending MPLS packets)
- Build the kernel: make -j `getconf _NPROCESSORS_ONLN`
- Once this has finished, build the debian packages: make -j `getconf _NPROCESSORS_ONLN` deb-pkg LOCALVERSION=-mplsfix
- This will create a bunch of .deb files in the parent directory - copy both linux-image-4.3.0-mplsfix_amd64.deb and linux-headers-4.3.0-mplsfix_amd64.deb to the machine you want to install your new kernel on
- Install the kernel with dpkg -i [package name]
- Reboot, select Advanced options for booting Ubuntu, and choose your new kernel
- You are all ready to go!
edit: easier way with a docker container: https://github.com/samrussell/kernelbuilder
Enabling MPLS
The MPLS modules aren't loaded by default, so you'll need to load them yourself:
modprobe mpls_router
modprobe mpls_gso
modprobe mpls_iptunnel
sysctl -w net.mpls.conf.enp0s9.input=1
sysctl -w net.mpls.conf.lo.input=1
sysctl -w net.mpls.platform_labels=1048575
You'll need to set net.mpls.conf.[interface-name].input=1 for any other interfaces that you plan to receive MPLS packets on, otherwise the MPLS route table won't accept your routes.
Applying MPLS routes
The latest release of iproute2 isn't quite ready, so we'll need to live life on the bleeding edge and build this from source too
git clone git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git
cd iproute2
./configure
make
sudo make install
Once this is done, we can see that iproute2 has a few more options available for us - try ip route help and see what is available.
Some route examples:
Routing 10.10.10.10/32 to 192.168.1.2 with label 100: ip route add 10.10.10.10/32 encap mpls 100 via inet 192.168.1.2
Label swapping 100 for 200 and sent to 192.168.2.2: ip -f mpls route add 100 as 200 via inet 192.168.2.2
Decapsulating label 300 and delivering locally: ip -f mpls route add 300 dev lo
Testbed setup
We're going to make use of network namespaces here to set up a couple of hosts. The plan is as follows:
- Base machine: has veth0 (plugs into veth1) and veth2 (plugs into veth3)
- Host1: Has veth1 (plugs into veth0)
- Host2: Has veth3 (plugs into veth2)
We will use label 111 for traffic from host1 to host2, and label 112 for traffic from host2 to host1. We will use penultimate hop popping here (as opposed to label swapping), but feel free to play with this and get different results.
Setup (all executed as root):
ip link add veth0 type veth peer name veth1
ip link add veth2 type veth peer name veth3
sysctl -w net.mpls.conf.veth0.input=1
sysctl -w net.mpls.conf.veth2.input=1
ifconfig veth0 10.3.3.1/24 up
ifconfig veth2 10.4.4.1/24 up
ip netns add host1
ip netns add host2
ip link set veth1 netns host1
ip link set veth3 netns host2
ip netns exec host1 ifconfig lo 10.10.10.1/32 up
ip netns exec host1 ifconfig veth1 10.3.3.2/24 up
ip netns exec host2 ifconfig lo 10.10.10.2/32 up
ip netns exec host2 ifconfig veth3 10.4.4.2/24 up
ip netns exec host1 ip route add 10.10.10.2/32 encap mpls 112 via inet 10.3.3.1
ip netns exec host2 ip route add 10.10.10.1/32 encap mpls 111 via inet 10.4.4.1
ip -f mpls route add 111 via inet 10.3.3.2
ip -f mpls route add 112 via inet 10.4.4.2
Testing (executed as root due to netns):
ip netns exec host2 ping 10.10.10.1 -I 10.10.10.2
Results:
tcpdump -envi veth0
tcpdump: listening on veth0, link-type EN10MB (Ethernet), capture size 262144 bytes
21:14:14.687380 9a:08:f4:cf:aa:9c > 12:c7:db:9d:a5:25, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 53781, offset 0, flags [DF], proto ICMP (1), length 84)
10.10.10.2 > 10.10.10.1: ICMP echo request, id 1359, seq 1, length 64
21:14:14.687404 12:c7:db:9d:a5:25 > 9a:08:f4:cf:aa:9c, ethertype MPLS unicast (0x8847), length 102: MPLS (label 112, exp 0, [S], ttl 64)
(tos 0x0, ttl 64, id 19009, offset 0, flags [none], proto ICMP (1), length 84)
10.10.10.1 > 10.10.10.2: ICMP echo reply, id 1359, seq 1, length 64
21:14:15.701789 9a:08:f4:cf:aa:9c > 12:c7:db:9d:a5:25, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 53845, offset 0, flags [DF], proto ICMP (1), length 84)
10.10.10.2 > 10.10.10.1: ICMP echo request, id 1359, seq 2, length 64
21:14:15.701810 12:c7:db:9d:a5:25 > 9a:08:f4:cf:aa:9c, ethertype MPLS unicast (0x8847), length 102: MPLS (label 112, exp 0, [S], ttl 64)
(tos 0x0, ttl 64, id 19246, offset 0, flags [none], proto ICMP (1), length 84)
10.10.10.1 > 10.10.10.2: ICMP echo reply, id 1359, seq 2, length 64
tcpdump -envi veth2
tcpdump: listening on veth2, link-type EN10MB (Ethernet), capture size 262144 bytes
21:14:45.714220 8e:d5:9d:07:9a:5c > d6:8a:7c:5e:5b:0f, ethertype MPLS unicast (0x8847), length 102: MPLS (label 111, exp 0, [S], ttl 64)
(tos 0x0, ttl 64, id 55648, offset 0, flags [DF], proto ICMP (1), length 84)
10.10.10.2 > 10.10.10.1: ICMP echo request, id 1363, seq 1, length 64
21:14:45.714251 d6:8a:7c:5e:5b:0f > 8e:d5:9d:07:9a:5c, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 22394, offset 0, flags [none], proto ICMP (1), length 84)
10.10.10.1 > 10.10.10.2: ICMP echo reply, id 1363, seq 1, length 64
21:14:46.717538 8e:d5:9d:07:9a:5c > d6:8a:7c:5e:5b:0f, ethertype MPLS unicast (0x8847), length 102: MPLS (label 111, exp 0, [S], ttl 64)
(tos 0x0, ttl 64, id 55848, offset 0, flags [DF], proto ICMP (1), length 84)
10.10.10.2 > 10.10.10.1: ICMP echo request, id 1363, seq 2, length 64
21:14:46.717570 d6:8a:7c:5e:5b:0f > 8e:d5:9d:07:9a:5c, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 22412, offset 0, flags [none], proto ICMP (1), length 84)
10.10.10.1 > 10.10.10.2: ICMP echo reply, id 1363, seq 2, length 64
It works!
Next steps
We have software routers such as Quagga and BIRD, and these speak some of the more traditional protocols such as OSPF and BGP. We now need LDP daemons, and other linux software to stand up l2vpn and l3vpn.
Thanks to the team on the netdev mailing list, they have been super responsive and helpful.