Microsoft released a new system that can supposedly differentiate between security and non-security software bugs about 99% of the time.
The tech giant used a data set that had 13 million work items and bugs from 47,000 of its developers stored throughout AzureDevOps and GitHub repositories in order to build a process and machine learning model that is able to correctly distinguish between security and non-security bugs.
The system will also be able to precisely make out critical, high-priority security bugs about 97% of the time on average.
The plan is to open source the methodology on GitHub in the next few months and also use example models and other resources so that the system will be able to aid in supporting human experts.
In the process of developing its model, the security experts have approved the training data and the statistical sampling that was used to give them a feasible amount of data to analyse. The data had then been encoded into representations called feature vendors as researchers at Microsoft used a two-step process to design the system.
The model had first understood how to classify security and non-security bugs and then went on to learn how to apply security labels which are either critical, important or low-impact, to those bugs.
Two methods are used by Microsoft’s model to make the bug predictions, the first is an information retrieval approach which is called frequency-inverse document frequency algorithm (TF-IDF) that finds the number of times a word shows up in a document and goes on to check its relevance within a collection of titles. Bug titles are normally small and include about 10 words.
The second technique the software giant uses is a logistic regression model which uses a logistic function to model the probability of a specific class or event existing.
Microsoft said in a blog post the announced the new system, how it had used the machine learning models and security experts to help find security bugs, by saying:
“Every day, software developers stare down a long list of features and bugs that need to be addressed. Security professionals try to help by using automated tools to prioritize security bugs, but too often, engineers waste time on false positives or miss a critical security vulnerability that has been misclassified. To tackle this problem data science and security teams came together to explore how machine learning could help. We discovered that by pairing machine learning models with security experts, we can significantly improve the identification and classification of security bugs.”
The new system has already been released into its internal production and is constantly being retrained with data that has been given the green light by the company’s security experts who observe the number of bugs that are generated at the time of software development.