Router and switch architecture Martin Heusse
Router internals — 1
Contenu
• Router architecture • Routing table data structure
Router internals — 2
What is routing? • Packet reception ✓ Interface FIFO (ring buffer?) holds groups of bits as they arrive ✓ Packet queued until treated by central CPU or interface card CPU (throw interrupt) ✓ Check CRC, is there space in memory… ✓ Packet classification (Dropped? Accepted? Switching method?) ✓ Moved to input hold queue Interface
Classify Ring buffer (sometimes)
Int. FIFO
Int. queue
Packet routing
• Packet forwarding ✓ Look up routing table ✓ Rewrite header (Ethernet, NAT?, TTL, checksum…) ✓ Packet moved to output hold queue Router internals — 3
Input and Output queues • Input queues absorb transient forwarding subsystem saturation Configurable • Output queues holds burst of packets directed to one interface
Switching
• Generally, queues hold a given number of packets (not bytes) How would you implement a queue? Ring? Chained list? What is the storage unit (MTU size bin, packet, particle…) ✓ There can be several queues in parallel (various priorities)
Router internals — 4
Shared memory — first generation • Ex.: conventional PC, Cisco 2800, HP ProCurve 7xxx
Bus
• Everything stored in same memory space
Shared memory
CPU memory (routing table)
CPU
• Limiting factor: memory access Router internals — 5
Shared memory — first generation (cont.)
Cisco 25xx Series
CIsco 25xx (1993)
Mgmt Card 2517-2519 Daughter and Hub Cards
WIC Slots 2524, 2525 Async 2509-2512 Hub Ports 2505, 2507 2516
Dual UART M 68030
Sys Ctrl ASIC
CPU Bus
WIC
System Bus
Boot ROM NVRAM PCMCIA Flash
Ether/TR WAN Intf
601 1094_06F9_c4
© 1999, Cisco Systems, Inc.
On Board DRAM
DRAM SIMM
60 Router internals — 6
Shared memory — first generation (cont.)
CIsco 7200
PCI Bridge
PA 3
PCI Bridge
PA 1
PCI Bridge
PCI Bus 2
PA 5
PCI Bus 1
Cisco 720x Series PCI Bridge
PA 6
PCI Bridge
PA 4
PCI Bridge
PA 2
I/O Controller Fast Ether
Midplane
PCMCIA
PCI PCI Bridge Bridge
SRAM ! NPE-100 Sys Ctrl GT 64010
Dual UART I/O Bus
NVRAM 601 1094_06F9_c4
Boot Flash
© 1999, Cisco Systems, Inc.
EEPROM
DRAM
CPU Bus
PCI Bus 0 Boot ROM
CPU NPE
R 4700 R 5000 Layer 2 Cache NPE-200 69
Router internals — 7
Shared memory — first generation (cont.)
Juniper M40
• Decoupling of control plane and forwarding plane —forwarding by a dedicated ASIC • 1998 — 40Gb/s • JunOS based on FreeBSD
Router internals — 8
Shared memory — first generation (cont.)
PIC: Physical Interface Card
Router internals — 9
Intelligent line cards — 2d generation • Ex.: Cisco 7500 • Line cards have some intelligence, write into each other’s memory
CPU
CPU
Bus
CPU CPU memory
CPU
• Limiting factor: 1 shared bus (needs to be N times faster than each of N interfaces) • Central processor dedicated only to control plane Distinct from Forwarding plane
Router internals — 10
Intelligent line cards — 2d generation (cont.) 601 1094_06F9_c4
71
© 1999, Cisco Systems, Inc.
7500 CiscoCIsco 75xx Series Dual UART
NVRAM
I/O Bus
Boot ROM
RSP
PCMCIA
Register FPGA Diag Bus FPGA
Boot Flash
Sys Ctrl ASICs
DRAM Layer 2 Cache
R 4600 R 4700 R 5000
CPU Bus MemD Ctrl ASICs
SRAM QA ASIC
Diag Bus Cy Bus 1
Cy Bus 0 IP/VIP
601 1094_06F9_c4
© 1999, Cisco Systems, Inc.
IP/VIP
Cy Bus Arbiter
IP/VIP
IP/VIP
72
Router internals — 11
Intelligent line cards — 2d generation (cont.)
PCI Bridge 2
PCI Bridge 1
DRAM
Boot ROM SRAM PMA ASICs
DRAM Ctrl ASICs
R 4600 R 4700 R 5000
CPU Bus Packet Bus
PA
VIP PCI Bus 0
PA
PCI Bus 1
PCI Bus 2
Cisco 75xxProcessors Series—VIP Versatile Interface (1/interface)
CBus
I/O Ctrl ASIC
Layer 2 Cache
CYA ASICs
EEPROM
Diag Bus
Router internals — 12 601
Intelligent line cards + crossbar switch 3d generation • Ex. Cisco 7600, juniper T-series, HP ProCurve Switch 4200vl… • Crossbar switch: CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
• Routing of N simultaneous packet (or cell) Router internals — 13
Head of line blocking
• Crossbar needs to be N times faster than each line or need one buffer / output on each input (i.e. one buffer per crosspoint) • What goes through the crossbar? ✓ ATM cells ✓ particles? (→ packet reassembly) ✓ packets Router internals — 14
Cisco router performance (packets/s)
Router 2500 2801 7200-NPE-G1 7600-dCEF720
Process switching 800 3000 79.000
Fast switching 4400 90.000 1.018.000 48.000.000 per slot
Router internals — 15
A step further
• Check Cisco CEF ( Cisco express forwarding) • Banyan switch • MPLS: packets carry an identifier of their processing
Router internals — 16
Routing table
• Static entries, routing protocols, ARP • Can be large! • Entries in use are cached (on interface cards, if applicable) → the cache holds a small subset of know destinations
Router internals — 17
ry Trie
Longest match lookup — Routing table storage
y to represent prefixes is using a trie. A trie is a tree-based data structure allowing the • Source: Ruiz-Sanchez, M.A.; Biersack, E.W.; Dabbous, W., n a digital basis by using the bits of prefixes to direct the branching. Figure 7 shows ”Survey and taxonomy of IP address lookup algorithms,” Network, as at most two children) a setMar/Apr of prefixes of a forwarding table. IEEE , vol.15,representing no.2, pp.8-23, 2001 Prefixes a 0* b 01000* c 011* d 1* e 100* f 1100* g 1101* h 1110* i 1111*
0
1
a
d 1
0
1
0 c
0
1
0
1
0
e 0
1
0
1
f
g
h
i
0 b Router internals — 18
many exception prefixes exist. Note that with this scheme, in the worst case, the path times. In the case of thePath-compressed original Sklower’s scheme thetrie backtrack phase also needs to
e trie because non-contiguous masks are allowed. Prefixes a 0* b 01000* c 011* d 1* e 100* f 1100* g 1101* h 1110* i 1111*
1 0
3 0
b
1
a
d 1
2
0
c
1
3
e
1
0
4
4 0
1
f
g
0
h
1
i
Figure 9: A path-compressed trie • Useful for sparsely populated space. But many prefixes used in IPv4 matching prefix problem has been addressed by using data structu ently, the longest • Backtracking necessary : after reaching and finding that sed tries, like the BSD trie. Path-compression makese much sense out when theitbinary tr does not match, need to go back to d (for 101… for example) ut when the number of prefixes increases and the trie gets denser, using path compres
over, the principal disadvantage of path-compressed tries, as well as binary tries in g Router internals — 19
iginal prefix a, which now has been suppressed. Prefix d has been obtained in a similar wa prefix trie at internal nodes are expandedDisjoint or pushed down to the leaves of the trie, this technique has bee
hing by Srinivasan et al. [14]. Figure 11 shows the disjoint intervals of addresses that correspo prefix binary trie of figure 10.
xes * 1000* 11* *
Prefixes a 0* b 01000* c 011* d 1* e 100* f 1100* g 1101* h 1110* i 1111*
0
1
1
0
0
1
1
0
a! 0
1
0 c
0
e
1 a"
0 b
1
d! 0 f
1
0
1
g
h
i
1 a#
Figure 10: Disjoint-prefix binary trie
• Disjoint prefixes do not overlap
Router internals — 20
There are other techniques!
Router internals — 21
Sources
• S. Keshav; “An engineering approach to computer networking” • Cisco Router Architecture www.cisco.com/networkers/nw99_pres/601.pdf
• Ross & Kurose “Computer Networking” • …
Router internals — 22