Issue 90: When Machine Learning Goes Wrong
The People of Ukraine are not forgotten. The Tufts University newspaper published an article this week about a multinational effort to preserve the digital and digitized cultural heritage of the country. On the other side of the war, Russian citizens are downloading Wikipedia out of fear of more drastic network filtering or collapse of Russia's connections to the global internet.
Eleven years ago this week, the judge overseeing the Google Book Search case (Authors Guild v. Google) ruled that the proposed settlement was not "not fair, adequate, and reasonable." As you might recall, the proposal was for a grand vision of a book author rights clearinghouse—not unlike what is in place for the music industry. I had a Thursday Threads entry that covered the initial reactions from the litigants, legal observers, and the library community.
In writing this week's article, I learned that machine learning is a subset of the artificial intelligence field. While the terms are often used interchangeably, machine learning is one part of artificial intelligence. As the Columbia University Engineering Department describes it, "put in context, artificial intelligence refers to the general ability of computers to emulate human thought and perform tasks in real-world environments, while machine learning refers to the technologies and algorithms that enable systems to identify patterns, make decisions, and improve themselves through experience and data." With that definition in mind, the thread this week is on challenges with machine learning:
- Flip the Switch on Your Drug Synthesizing Tool and Chemical Weapons Come Out
- With Machine Learning, Garbage In/Garbage Out
- Five Realities Why Applying Machine Learning to Medical Records is Hard
Feel free to send this newsletter to others you think might be interested in the topics. If you are not already subscribed to DLTJ's Thursday Threads, visit the sign-up page. If you would like a more raw and immediate version of these types of stories, follow me on Mastodon where I post the bookmarks I save. Comments and tips, as always, are welcome.
Flip the Switch on Your Drug Synthesizing Tool and Chemical Weapons Come Out
By changing the parameters of the machine learning model, the output of the model changed dramatically. The model is trained to look for promising compounds that could be turned into pharmaceuticals. As part of that process, the model tests for toxicity and eliminates those that would likely be harmful to humans. Rather than eliminating those, what if the model preferred toxic compounds? You get a known chemical warfare agent and what looks like many more compounds that could be turned into chemical agents.
In a later commentary published through the American Association for the Advancement of Science (AAAS), a researcher said: "Now, keep in mind that we can't deliberately design our way to drugs so easily, so we won't be able to design horrible compounds in one shot, either. Just as there are considerations in drug discovery that narrow down these sorts of lead-generation efforts, there are such factors for chemical weapons: stability on storage, volatility (or lack of it), persistence in the environment, manufacturing concerns, etc."
Also of note, the human-in-the-loop was a critical breakpoint between the model's findings as concepts and the physical instantiation of the model's conclusions. As the journal article goes on to say, unwanted outcomes can come from both taking the human out of the loop and replacing the human in the loop with someone with a different moral or ethical driver.
So it may be of some comfort that there is more between the machine learning model and a weapon. But even with those extra steps, how is something like this regulated? Will working with machine learning algorithms become the type of job requiring a psychological evaluation? Would that even matter with open source tools and open datasets? The tool is neither good nor evil; it is in how the tool is used or misused.
With Machine Learning, Garbage In/Garbage Out
It isn't going unnoticed in the computing profession that there need to be ways to quantify problems with machine learning models. You've probably read the stories of how facial recognition models trained with picture datasets consisting of primarily white male faces had difficulty zeroing in on anyone who wasn't a white male. This article describes the difficulties in recognizing biases in training data and quantifying accuracy measurements.
Five Realities Why Applying Machine Learning to Medical Records is Hard
This article describes the difficulties of using machine learning algorithms to synthesize knowledge from medical records. It is also an indictment of the extent to which the requirements of insurance companies (and the subsequent actions by medical providers to subvert the requirements) have mucked up the practice of medicine.
Spring in the Northern Hemisphere Makes Cats Happy
It is warming up—70°F/21°C—earlier this week, and that means the cats want to go outside. We don't let them wander the neighborhood by themselves for their own protection. Each has a harness and about 50 feet (15 meters) of cord for them to roam the backyard. It is funny how just a little bit of sun can cheer up a cat.
On the other hand, it has turned cool and rainy the remainder of the week, so the cat minder (me) is not all that interested in going outside. Once they have had the taste of the outdoors, it becomes tough to put up with their constant meowing and pawing at the glass. Just a little bit longer, Mittens and Alan...just a little bit longer.