Good data? with Ben Shneiderman

During the interview on Artificial Intelligence with Ben Shneiderman, he begins by asking the undergraduates a few questions about the different methods possible used to make predictions. Specifically the difference between machine learning and statistical method. While he continues to ask the undergraduates multiple questions on their thoughts regarding those to methods, I would be very interested to ask what he believes is a method that would allow us to ensure that the data we use adds to the accuracy of our models or if it is even possible to create such a method? As well what his thoughts on the importance of "good data" to the accuracy of our predictions in relation to the other elements of our models. Especially since this is relevant to the discussions we've been having in class recently about simulations and statistical methods and would help us understand how those differences play a role in machine learning and artificial intelligence.

1 comment

1 Comment

vincentli

Apr 05, 2021

•

You make a great point, Julie! Judging whether data is “good” seems to require examining the use case of the model and pre-existing circumstances that may influence decision making, both by the model and/or by humans interpreting the model. Model accuracy is definitely an important goal to optimize for, but we may also want our model to optimize for some metric of fairness, perhaps based on race and gender. Does the model treat individuals of different races or genders equally? Can we say something about the distribution of results being independent of race or gender? This notion of statistical independence may be helpful to define whether the model is fair. Because the model depends on the training data, "good" data can be defined as data that leads to a "fair" model. Perhaps, we can think of “good” meaning that the data is representative of the population that the model will be used on. For example, is the racial and gender breakdown of people in the dataset similar to the overall population? If not, the model may be biased, as Vox media producer Joss Fong illustrates in her video “Are We Automating Racism?”.