White Box Switches: Part Two, ICOS

White Box Switches: Part Three, OPS

What Is a White Box Switch?

After several years since their introduction to the market, there is still confusion about open and disaggregated networking. We are receiving a lot of requests from people who want to try bare-metal switches and asking for something that simply won’t work.

The most typical error lies in the SW people want to use - "We want to get a box X and load it with Y." The Y could range from a well-established NOS to an open-source routing thing or something they’ve heard about, like DANOS or Stratum. Sometimes it’s no SW at all, just ONIE. ONIE is nice, but it leaves a switch just a box making noise with its fans, no packet processing there.

It’s no wonder, because with all the buzz about open network hardware, open source NOS, and disaggregation it’s easy to make a connection with white box servers. White box servers are quite a commodity now; one can grab a unit, get one’s favorite Linux distro and load all necessary apps.

There is only one problem - it doesn't work that way on switches.

To understand why, we need to understand - what is a switch?

What Is a Switch?

We all know pictures from various SW vendors, like

NOS architecture

or

NOS architecture

Some silly guys even claim to be "silicon-independent"!

The key lies in the lowest level - "open hardware." It has an ASIC. An ASIC is what allows a switch to do line-rate packet forwarding. Everybody loves line-rate forwarding. Quite a power-hungry piece of silicon with the capability to process packets by a set of rules, without involving the control plane all the time.

an ASIC

If there is an ASIC, there is an SDK for it. And a driver based on that SDK.

The Trick

ASIC drivers are not open, and we don’t think that any vendor will open them in the foreseeable future. It’s their core business, making better ASICs and drivers. That’s why you cannot load a random opens-source routing suite/OS and live happily ever after.

OF-DPA/OpenNSL

If you have OF-DPA and/or OpenNSL libraries, you still cannot load Quagga/FRR/OVS/whatever to the box and use it.

It is true that you can find Quagga/FRR in almost all open-source NOS projects. A lot of work was done to integrate them and working with an ASIC driver.

With OF-DPA you have to write your own code, that is the main point of OpenFlow, like in this use case. Otherwise, all you can do is connecting it to an OpenFlow controller and then search for working apps. No working connector to Quagga was found so far.

With OpenNSL you can adapt your box to using an existing NOS (like SONiC in the case above) or write your own code.

Nothing will work out of the box in either case.

Open Network Linux

ONL is not a ready-to-use NOS. It's a nice foundation to build one since all the base work on managing HW is done.

Still, it is only compatible with APIs to build your NOS. Or FBOSS, if you have a Wedge switch. Nothing else there.

A NOS?

Why is it not possible to load a ready NOS which has drivers for the ASIC in your box as well?

Unlike servers, an Ethernet switch is not a standardized device. Every design is a bit different and requires efforts to adopt in a NOS. Even control planes are different - most common architectures are x86 and (less and less popular) PowerPC, we can also see ARM processors in entry-level models. The difference in control planes and the way they boot has led to the birth of ONIE, a technique to unify the ecosystem and have the same way to load a NOS everywhere.

I2C tree management, activity indication, switch ports configuration - making a NOS compatible with a switch will take efforts.

How to port a NOS?

There is a great porting guide for Microsoft SONiC - https://github.com/Azure/SONiC/wiki/Porting-Guide

Quoting:

However, many devices share the same ASIC platform, and only differ in
other device-specific hardware components which require custom device
drivers  and/or configuration files which are loaded when the system is
initialized. This guide describes requirements and general guidelines to
follow  when porting SONiC to a new device which contains a supported
ASIC platform.

To get a switch HW up and running in a NOS, you will need:

  • Platform drivers

    • For transceivers

    • Sensors

    • LED management

    • System EEPROM driver

  • Device-specific

    • ASIC configuration and port-mapping

    • Fan control

    • PSU control

    • Installer configuration

So HW support looks like this:

HW support in a NOS

Port mapping is especially important to match physical ports, which layout can be different across switch vendors, with the NOS port names.

Having all that ready, a switch can be added to a NOS compatibility list.

How to choose a white box switch?

First of all, you need to decide your network architecture and protocols you need and check available NOS options for compliance.

The second step is checking the compatibility list and picking suitable hardware. There are many models with seemingly same features and different ASICs. Broadcom’s Trident2, Trident2+, Trident3, Maverick, Tomahawk, Tomahawk+; Cavium, Mellanox, Nephos, Barefoot - it’s almost endless.

Vendors boast that their products are the most flexible, open, feature-rich, etc. In reality it doesn’t make a big difference. All major NOS features are working across all switches in the HCL. There may be some minor feature differences and different settings for VXLAN routing.

So chose the desired port specs, check if there are any notifications about limitations, and that’s it.

If you have several NOS you want to take a look into - cross-check their HCL’s and find gear that is mentioned in all of them.

If you are not sure about the result, do not hesitate to check with your supplier what is the best option. Some vendors may have better lead time and/or support.

With all that in mind, you can build a PoC segment and experiment to find out the best choice.

After all these efforts, a question remains - where is a break of vendor lock-in?

Is there a break of vendor lock-in?

There is no complete break of vendor lock-in.

A traditional network is built around a fixed set of HW+SW.

With disaggregation of HW and SW, you get a lock-in around the way you manage your network. I.e., methods to set a configuration to SONiC, PicOS, Cumulus Linux, OpenSwitch (OPS), OpenSwitch (OPX, and it’s not the same thing with the OPS!), etc. are completely different. After setting your way with a NOS, migrating to a different NOS will require engineering efforts to set new routines, scripts, changing all these things you got used to.

You still can change your HW vendor without much trouble, though.

Another hilarious example is OpenSwitch (OPX) that boasts about openness and diversified White Box Hardware. Its HCL is comprised of Dell EMC gear with a single lonely Edge-Core model. There is no disaggregation or openness in that, only another vendor lock-in.

Conclusion

White box and disaggregated networking do deliver more openness in building your network.

It also offers cost reduction in both CAPEX and OPEX, choice of a way to manage your network, and reduces your dependence on a single all-in-one vendor.

Worth trying!