Increasing SDN Reliability through Systematic Network Destruction
btschaen at cs.duke.edu
||Wednesday, October 12, 2016
||10:00am - 12:00pm
||D344 LSRC, Duke
Network failures are inevitable. Interfaces go down, devices crash and resources become exhausted. It is the responsibility of the control software to provide reliable services on top of unreliable components and throughout unpredictable events. Guaranteeing the correctness of the controller under all types of failures is therefore essential for network operations. Yet, this is also an almost impossible task due to the complexity of the control software, the underlying network, and the lack of precision in simulation tools.
In this talk, we discuss a new network testing framework, based on Netflix's Chaos Monkey, which injects live failures into production environments. Our SDN Chaos Monkey causes smart failures according to network operator specifications and network invariants. After the failures, the system analyzes the reaction of the controller, providing confidence that the controller is robust under stress.
Advisor(s): Theophilus Benson
Jeffrey Chase, Xiaowei Yang