It is difficult to eliminate bias from many of the algorithms that increasingly are becoming part of our daily lives, from search engines to facial recognition systems. Here are the four of the most common kinds of bias related to machine learning systems.
Algorithms are only as good as the people who develop them. As ‘The New Scientist’ reports, machine learning is prone to amplify sexist and racist bias from the real world. We have seen this, for example, in image recognition software that fails to identify non-white faces correctly. Similarly, biased data samples can teach machines that women shop and cook, while men work in offices and factories. This kind of problem usually occurs when the scientists who train the data unwittingly introduce their own prejudices into their work.
Biases can also occur when a sample is collected in such a way that some members of the intended statistical population are less likely to be included than others. In other words, the data used to train a model does not accurately reflect the environment in which it will operate.
A sampling bias could be introduced, for instance, if an algorithm used for medical diagnosis is trained only on data from one population. Similarly, if an algorithm meant to operate self-driving vehicles all year round is trained only on data from the summer months, falling snowflakes might confuse the system.
Systematic value distortion
Systematic value distortion occurs when the true value of a measurement is systematically overstated or understated. This kind of error usually occurs when there is a problem with the device or process used to make the measurements.
On a relatively simple level, measurement errors might occur if training data is captured on a camera that filters out some colours. Often the problem is more complex.
In health care, for instance, it is difficult to implement a uniform process for measuring patient data from electronic records. Even superficially similar records may be difficult to compare. This is because a diagnosis usually requires interpreting test results and making several judgements at different stages in the progress of a disease, with the timing of the initial decision depending on when a patient first felt sick enough to see a doctor. An algorithm must be able to take all the variables into account in order to make an accurate prognosis.
Algorithmic bias is what happens when a machine learning system reflects the values of the people who developed or trained it. For example, confirmation bias may be built into an algorithm if the intentional or unintentional aim is to prove an assumption or opinion. This might happen in a business, journalistic, or political environment, for example.
Download the IEC White Paper: Artificial intelligence across industries