Machine learning security is business critical
ML security has the same goal as all cybersecurity measures: to reduce the risk of exposure of sensitive data. If a bad actor interferes with your ML model or the data it uses, that model can produce incorrect results that at best undermine the benefits of ML and at worst negatively impact your business or your customers.
“Executives need to care because there’s nothing worse than doing the wrong thing very quickly and confidently,” says Zach Hanif, vice president of machine learning platforms at Capital One. And while Hanif works in a regulated industry — financial services — requiring additional layers of governance and security, he says every company adopting ML should take the opportunity to review its security practices.
Devon Rollins, Vice President of Cyber Engineering and Machine Learning at Capital One, adds, “Securing business-critical applications requires a differentiated level of protection. It’s safe to assume that many large-scale ML tool deployments are critical given the role they play for the business and their direct impact on user outcomes. »
New security considerations to keep in mind
While the best practices for securing ML systems are similar to those for any software or hardware system, greater adoption of ML also presents new considerations. “Machine learning adds another layer of complexity,” says Hanif. “This means organizations need to consider multiple points in a machine learning workflow that can represent entirely new vectors.” These core workflow elements include ML models, the documentation and systems around those models and the data they use, and the use cases they enable.
It is also imperative that ML models and supporting systems are developed with security in mind from the start. It is not uncommon for engineers to rely on freely available open source libraries developed by the software community, rather than coding every aspect of their program. These libraries are often designed by software engineers, mathematicians, or academics who may not be as versed in writing secure code. “The people and skills needed to develop high-performance or cutting-edge ML software may not always intersect with security-focused software development,” adds Hanif.
According to Rollins, this highlights the importance of cleaning up open source code libraries used for ML models. Developers should consider considering confidentiality, integrity, and availability as a framework to guide information security policy. Confidentiality means that data assets are protected from unauthorized access; integrity refers to data quality and security; and availability ensure that the right authorized users can easily access the data needed for the job at hand.
Additionally, ML input data can be manipulated to compromise a model. One of the risks is inference manipulation, which is basically changing the data to fool the model. Since ML models interpret data differently than the human brain, the data can be manipulated in ways that are imperceptible to humans, but which nevertheless change the results. For example, all it takes to compromise a computer vision model may be to change a pixel or two in an image of a stop sign used in that model. The human eye would still see a stop sign, but the ML model might not categorize it as a stop sign. Alternatively, one could probe a model by sending a series of variable input data, thereby learning how the model performs. By observing how entries affect the system, Hanif says, outside actors could figure out how to disguise a malicious file so that it escapes detection.
Another risk vector is the data used to train the system. A third party could “poison” the training data so that the machine learns something incorrectly. Therefore, the trained model will make errors, such as automatically identifying all stop signs as yield signs.