There is an ominous rumbling in the internet about 768K
day, some even termed it internet doomsday others called it “Y2K” of internet.
The fear is justified given the experience of wide spread internet outage during
512K day when internet BGP table size exceeded 512,000 routes. The 512K day
caused havoc and many routers simply exhausted of TCAM (Ternary
content-addressable memory) size and were unable to process certain routes
leaving parts of internet unreachable. The same issue seems possible this year
again when internet routes exceeds 768K routes. Some predicts August 12 is 768K
day following 512k Day which happened in August 12, 2014[1].
Some Internet Outages are predicted
While majority of teir1 ISPs were caught off guard during
512K day, this should not be the case this time around. There are mechanisms
within BGP route configuration to protect routers from exhausting TCAM and
presumably ISPs upgraded their legacy routers with patches. However, this is
temporary fix and may come with string attached on reachability etc. A more
permanent fix requires routers to accept upto full internet routes in its
forwarding table. This requires a special memory known as “TCAM”. Older routers
are built with limited TCAM size and hence, unable to provide faster response
and may exhaust its resources and eventually fail or unable to process certain
prefixes.
A typical outage could be something similar to the
following diagram.
Figure 1. [Courtesy ThousandEyes] This diagram is
presented in the blog page of “Thousand Eyes” at https://blog.thousandeyes.com/what-is-768k-day/
to depict recent outage in bayarea.
A blog post in “Thousand Eyes”[2] claimed that the writer
has observed packet lost on several interfaces in the Cogent (AS 174) network
in San Francisco Bayarea. As a result, many peer ISPs like comcast, quest,
amazon, 8x8 etc were affected. Recently, several media reported outage in Australia
in which media outlet like CSO[2] and Computerworld[3] claimed the outage
directly related to BGP prefixes reaching 768K. There seems to be an increase
phenomena of internet outage reported in twitter[5] message. Proper analyses
could ascertain how many of reported outages are due to BGP prefix issues.
Nonetheless, BGP route sizes are increasing and this will definitely cause
network reachability issues in routers that lacks bigger forwarding table.
According to CIDR report (a website that keeps track of
global BGP routes), the BGP route size already exceeded 768K and current table
size shown as 783K as of June 11th, 2019[6]. However, this report is
not official and may include duplicates.
Irrespective of the numbers presented in CIDR report, I can ascertain that majority of the customers I talked to, are looking to replace or add edge routers with 750k+ table size for IPV4 and around 65K for IPv6. Henceforth, it should go without saying that one should be cognizant of the issue and take precaution before 768K day arrives.
Irrespective of the numbers presented in CIDR report, I can ascertain that majority of the customers I talked to, are looking to replace or add edge routers with 750k+ table size for IPV4 and around 65K for IPv6. Henceforth, it should go without saying that one should be cognizant of the issue and take precaution before 768K day arrives.
Under the hood
Internet routers generally process route request in two tables
in conjunction with routing protocols: RIB and FIB. While RIB is part of
control plane and generally processed by NOS, much of the table look and
processing are done at FIB level which part of Routing hardware or reside
within the pipeline of Marchant Silicon or ASIC.
Figure 2. Routing functions including RIB and FIB
processing.
If the lookup engine (TCAM) within ASIC pipeline lacks
capability of processing certain number of tables for IPv4 packets, RIB may flood
the table causing overflow problem. With patches, routers may able to control
processing and lookups at ASIC pipeline. However, such patches are temporary
fix and protects routers from failing. A more permanent fix is to somehow
connect ASIC packet processing pipeline to external TCAMs using high speed bus.
Older ASICs lacks such capabilities resulting routers more software dependable and
may be limited in table size capabilities.
Solution: Can whitebox Switch help?
Disaggregation or whitebox is the best solutions for this
problem. Buy the choice of hardware herein router/switch from your preferred
vendor and select software or Network Operating System (NOS). The benefit of
whitebox or disaggregation nfor that matter allows you to select best of the
breed merchant silicon and buy those in your terms with a price point you can
afford. Result you get the best of both world: hardware and software. For the
bigger IPv4 table size, Broadcom® Qumran-MX™ silicon with BCM52311™
Knowledge-Based Processor (KBP) [TCAM] provides you optimal choice for upto 1
million IPv4 routes. There are cases where upto 1.2 million routes are possible
in such system.
Figure 3. Edge Router Whitebox based on Broadcom®
Qumran-MX.
A number of hardware vendors are currently offering Qumran-MX
based platform with industry proven NOS from companies such as IP Infusion. As depicted in the figure above, Merchant Silicon
herein Broadcom® Qumran-MX™ is connected through an internal bus (known as ELK
bus) to external TCAM which provides further capabilities for lookups.
However, it is also important to select appropriate
software vendor that has optimize such boxes and provides optimal route
capabilities of more than 768K to facilitate your upgrade or help in your preparation
for 768K day.
IPInfusion’s OcNOS™ is tested with a number of Hardware vendors
providing you a wide slections, please ensure you select appropriate Qumran-MX
based hardware with External TCAM to achieve upto 1 Million route. If you are
interested about OcNOS and how it can solve your 768K day, you may visit their
website at https://www.ipinfusion.com
.
However, please make sure you ask each vendor to provide
you with test report or atleast enough data to make educated decision.
Reference
[1] Some internet outages predicted for the coming month as
'768k Day' approaches. Available at https://www.zdnet.com/article/some-internet-outages-predicted-for-the-coming-month-as-768k-day-approaches/.
[2] Australian Internet Users Face Looming ‘768k Day’.
Available at https://www.cso.com.au/mediareleases/34669/australian-internet-users-face-looming-768k-day/
[3] Australian Internet Users Face Looming ‘768k Day’. https://www.computerworld.com.au/mediareleases/34669/australian-internet-users-face-looming-768k-day/.
[4] Thousand Eyes. What is 768K Day, and Will It Cause Internet
Outages? Available at https://blog.thousandeyes.com/what-is-768k-day/.
[5] Internet outage tag at twitter. Available at https://twitter.com/search?q=internet%20outage&src=tyah
Comments