Tech-makers assuming their reality accurately represents the world create many different kinds of problems. The training data for ChatGPT is believed to include most or all of Wikipedia, pages linked from Reddit, a billion words grabbed off the internet
LLMs as a model of reality, but not reality
There are limits to any model. In this case, the training data. What biases are implicitly in that model based on how it was selected and what it contained?
The paragraph goes on to list some biases: race, wealth, and “vast swamps”