768K Day - Internet Doomsday? Is it real?



There is an ominous rumbling in the internet about 768K day, some even termed it internet doomsday others called it “Y2K” of internet. The fear is justified given the experience of wide spread internet outage during 512K day when internet BGP table size exceeded 512,000 routes. The 512K day caused havoc and many routers simply exhausted of TCAM (Ternary content-addressable memory) size and were unable to process certain routes leaving parts of internet unreachable. The same issue seems possible this year again when internet routes exceeds 768K routes. Some predicts August 12 is 768K day following 512k Day which happened in August 12, 2014[1].

Some Internet Outages are predicted

While majority of teir1 ISPs were caught off guard during 512K day, this should not be the case this time around. There are mechanisms within BGP route configuration to protect routers from exhausting TCAM and presumably ISPs upgraded their legacy routers with patches. However, this is temporary fix and may come with string attached on reachability etc. A more permanent fix requires routers to accept upto full internet routes in its forwarding table. This requires a special memory known as “TCAM”. Older routers are built with limited TCAM size and hence, unable to provide faster response and may exhaust its resources and eventually fail or unable to process certain prefixes.
A typical outage could be something similar to the following diagram.


Figure 1. [Courtesy ThousandEyes] This diagram is presented in the blog page of “Thousand Eyes” at https://blog.thousandeyes.com/what-is-768k-day/ to depict recent outage in bayarea.
A blog post in “Thousand Eyes”[2] claimed that the writer has observed packet lost on several interfaces in the Cogent (AS 174) network in San Francisco Bayarea. As a result, many peer ISPs like comcast, quest, amazon, 8x8 etc were affected. Recently, several media reported outage in Australia in which media outlet like CSO[2] and Computerworld[3] claimed the outage directly related to BGP prefixes reaching 768K. There seems to be an increase phenomena of internet outage reported in twitter[5] message. Proper analyses could ascertain how many of reported outages are due to BGP prefix issues. Nonetheless, BGP route sizes are increasing and this will definitely cause network reachability issues in routers that lacks bigger forwarding table.
According to CIDR report (a website that keeps track of global BGP routes), the BGP route size already exceeded 768K and current table size shown as 783K as of June 11th, 2019[6]. However, this report is not official and may include duplicates.
Irrespective of the numbers presented in CIDR report, I can ascertain that majority of the customers I talked to, are looking to replace or add edge routers with 750k+ table size for IPV4 and around 65K for IPv6. Henceforth, it should go without saying that one should be cognizant of the issue and take precaution before 768K day arrives.

Under the hood

Internet routers generally process route request in two tables in conjunction with routing protocols: RIB and FIB. While RIB is part of control plane and generally processed by NOS, much of the table look and processing are done at FIB level which part of Routing hardware or reside within the pipeline of Marchant Silicon or ASIC.


Figure 2. Routing functions including RIB and FIB processing.

If the lookup engine (TCAM) within ASIC pipeline lacks capability of processing certain number of tables for IPv4 packets, RIB may flood the table causing overflow problem. With patches, routers may able to control processing and lookups at ASIC pipeline. However, such patches are temporary fix and protects routers from failing. A more permanent fix is to somehow connect ASIC packet processing pipeline to external TCAMs using high speed bus. Older ASICs lacks such capabilities resulting routers more software dependable and may be limited in table size capabilities.

Solution: Can whitebox Switch help?

Disaggregation or whitebox is the best solutions for this problem. Buy the choice of hardware herein router/switch from your preferred vendor and select software or Network Operating System (NOS). The benefit of whitebox or disaggregation nfor that matter allows you to select best of the breed merchant silicon and buy those in your terms with a price point you can afford. Result you get the best of both world: hardware and software. For the bigger IPv4 table size, Broadcom® Qumran-MX™ silicon with BCM52311™ Knowledge-Based Processor (KBP) [TCAM] provides you optimal choice for upto 1 million IPv4 routes. There are cases where upto 1.2 million routes are possible in such system.


Figure 3. Edge Router Whitebox based on Broadcom® Qumran-MX.
A number of hardware vendors are currently offering Qumran-MX based platform with industry proven NOS from companies such as IP Infusion.  As depicted in the figure above, Merchant Silicon herein Broadcom® Qumran-MX™ is connected through an internal bus (known as ELK bus) to external TCAM which provides further capabilities for lookups.
However, it is also important to select appropriate software vendor that has optimize such boxes and provides optimal route capabilities of more than 768K to facilitate your upgrade or help in your preparation for 768K day.  
IPInfusion’s OcNOS™ is tested with a number of Hardware vendors providing you a wide slections, please ensure you select appropriate Qumran-MX based hardware with External TCAM to achieve upto 1 Million route. If you are interested about OcNOS and how it can solve your 768K day, you may visit their website at https://www.ipinfusion.com .
However, please make sure you ask each vendor to provide you with test report or atleast enough data to make educated decision.





Reference


[1] Some internet outages predicted for the coming month as '768k Day' approaches. Available at https://www.zdnet.com/article/some-internet-outages-predicted-for-the-coming-month-as-768k-day-approaches/.
[2] Australian Internet Users Face Looming ‘768k Day’. Available at https://www.cso.com.au/mediareleases/34669/australian-internet-users-face-looming-768k-day/
[4] Thousand Eyes. What is 768K Day, and Will It Cause Internet Outages? Available at https://blog.thousandeyes.com/what-is-768k-day/.
[5] Internet outage tag at twitter. Available at https://twitter.com/search?q=internet%20outage&src=tyah
[6] CIDR, 2019. CIDR report for June 11, 2019. Available at https://www.cidr-report.org/as2.0/



Comments