What Is a White Box Switch?
After several years since their introduction to the market, there is still confusion about open and disaggregated networking. We are receiving a lot of requests from people who want to try bare-metal switches and asking for something that simply won’t work.
The most typical error lies in the SW people want to use - "We want to get a box X and load it with Y." The Y could range from a well-established NOS to an open-source routing thing or something they’ve heard about, like DANOS or Stratum. Sometimes it’s no SW at all, just ONIE. ONIE is nice, but it leaves a switch just a box making noise with its fans, no packet processing there.
It’s no wonder, because with all the buzz about open network hardware, open source NOS, and disaggregation it’s easy to make a connection with white box servers. White box servers are quite a commodity now; one can grab a unit, get one’s favorite Linux distro and load all necessary apps.
To understand why, we need to understand - what is a switch?
What Is a Switch?
We all know pictures from various SW vendors, like
Some silly guys even claim to be "silicon-independent"!
The key lies in the lowest level - "open hardware." It has an ASIC. An ASIC is what allows a switch to do line-rate packet forwarding. Everybody loves line-rate forwarding. Quite a power-hungry piece of silicon with the capability to process packets by a set of rules, without involving the control plane all the time.
If there is an ASIC, there is an SDK for it. And a driver based on that SDK.
ASIC drivers are not open, and we don’t think that any vendor will open them in the foreseeable future. It’s their core business, making better ASICs and drivers. That’s why you cannot load a random opens-source routing suite/OS and live happily ever after.
If you have OF-DPA and/or OpenNSL libraries, you still cannot load Quagga/FRR/OVS/whatever to the box and use it.
It is true that you can find Quagga/FRR in almost all open-source NOS projects. A lot of work was done to integrate them and working with an ASIC driver.
With OF-DPA you have to write your own code, that is the main point of OpenFlow, like in this use case. Otherwise, all you can do is connecting it to an OpenFlow controller and then search for working apps. No working connector to Quagga was found so far.
With OpenNSL you can adapt your box to using an existing NOS (like SONiC in the case above) or write your own code.
Nothing will work out of the box in either case.
Open Network Linux
ONL is not a ready-to-use NOS. It's a nice foundation to build one since all the base work on managing HW is done.
Still, it is only compatible with APIs to build your NOS. Or FBOSS, if you have a Wedge switch. Nothing else there.
Why is it not possible to load a ready NOS which has drivers for the ASIC in your box as well?
Unlike servers, an Ethernet switch is not a standardized device. Every design is a bit different and requires efforts to adopt in a NOS. Even control planes are different - most common architectures are x86 and (less and less popular) PowerPC, we can also see ARM processors in entry-level models. The difference in control planes and the way they boot has led to the birth of ONIE, a technique to unify the ecosystem and have the same way to load a NOS everywhere.
I2C tree management, activity indication, switch ports configuration - making a NOS compatible with a switch will take efforts.
How to port a NOS?
There is a great porting guide for Microsoft SONiC - https://github.com/Azure/SONiC/wiki/Porting-Guide
However, many devices share the same ASIC platform, and only differ in other device-specific hardware components which require custom device drivers and/or configuration files which are loaded when the system is initialized. This guide describes requirements and general guidelines to follow when porting SONiC to a new device which contains a supported ASIC platform.
To get a switch HW up and running in a NOS, you will need:
System EEPROM driver
ASIC configuration and port-mapping
So HW support looks like this:
Port mapping is especially important to match physical ports, which layout can be different across switch vendors, with the NOS port names.
Having all that ready, a switch can be added to a NOS compatibility list.
How to choose a white box switch?
First of all, you need to decide your network architecture and protocols you need and check available NOS options for compliance.
The second step is checking the compatibility list and picking suitable hardware. There are many models with seemingly same features and different ASICs. Broadcom’s Trident2, Trident2+, Trident3, Maverick, Tomahawk, Tomahawk+; Cavium, Mellanox, Nephos, Barefoot - it’s almost endless.
Vendors boast that their products are the most flexible, open, feature-rich, etc. In reality it doesn’t make a big difference. All major NOS features are working across all switches in the HCL. There may be some minor feature differences and different settings for VXLAN routing.
So chose the desired port specs, check if there are any notifications about limitations, and that’s it.
If you have several NOS you want to take a look into - cross-check their HCL’s and find gear that is mentioned in all of them.
If you are not sure about the result, do not hesitate to check with your supplier what is the best option. Some vendors may have better lead time and/or support.
With all that in mind, you can build a PoC segment and experiment to find out the best choice.
After all these efforts, a question remains - where is a break of vendor lock-in?
Is there a break of vendor lock-in?
There is no complete break of vendor lock-in.
A traditional network is built around a fixed set of HW+SW.
With disaggregation of HW and SW, you get a lock-in around the way you manage your network. I.e., methods to set a configuration to SONiC, PicOS, Cumulus Linux, OpenSwitch (OPS), OpenSwitch (OPX, and it’s not the same thing with the OPS!), etc. are completely different. After setting your way with a NOS, migrating to a different NOS will require engineering efforts to set new routines, scripts, changing all these things you got used to.
You still can change your HW vendor without much trouble, though.
Another hilarious example is OpenSwitch (OPX) that boasts about openness and diversified White Box Hardware. Its HCL is comprised of Dell EMC gear with a single lonely Edge-Core model. There is no disaggregation or openness in that, only another vendor lock-in.
White box and disaggregated networking do deliver more openness in building your network.
It also offers cost reduction in both CAPEX and OPEX, choice of a way to manage your network, and reduces your dependence on a single all-in-one vendor.