To recognize and address these situations, you have to make sure that you test the algorithm in a regime that is similar to how it will be used in the real world. So, if your machine-learning algorithm is one that is trained on the data from a given set of hospitals, and you will only use it in those same set of hospitals, then latching onto which hospital did the scan could well be a reasonable approach. It’s effectively letting the algorithm incorporate prior knowledge about the patient population in different hospitals. The problem really arises if you’re going to use that algorithm in the context of another hospital that wasn’t in your data set to begin with. Then, you’re asking the algorithm to use these biases that it learned on the hospitals that it trained on, on a hospital where the biases might be completely wrong.
Example of biases and recognition of them.