Issue 90: When Machine Learning Goes Wrong

Posted on 6 minute read

The People of Ukraine are not forgotten. The Tufts University newspaper published an article this week about a multinational effort to preserve the digital and digitized cultural heritage of the country. On the other side of the war, Russian citizens are downloading Wikipedia out of fear of more drastic network filtering or collapse of Russia’s connections to the global internet.

Eleven years ago this week, the judge overseeing the Google Book Search case (Authors Guild v. Google) ruled that the proposed settlement was not “not fair, adequate, and reasonable.” As you might recall, the proposal was for a grand vision of a book author rights clearinghouse—not unlike what is in place for the music industry. I had a Thursday Threads entry that covered the initial reactions from the litigants, legal observers, and the library community.

In writing this week’s article, I learned that machine learning is a subset of the artificial intelligence field. While the terms are often used interchangeably, machine learning is one part of artificial intelligence. As the Columbia University Engineering Department describes it, “put in context, artificial intelligence refers to the general ability of computers to emulate human thought and perform tasks in real-world environments, while machine learning refers to the technologies and algorithms that enable systems to identify patterns, make decisions, and improve themselves through experience and data.” With that definition in mind, the thread this week is on challenges with machine learning:

Feel free to send this newsletter to others you think might be interested in the topics. If you are not already subscribed to DLTJ’s Thursday Threads, visit the sign-up page. If you would like a more raw and immediate version of these types of stories, follow me on Mastodon where I post the bookmarks I save. Comments and tips, as always, are welcome.

Flip the Switch on Your Drug Synthesizing Tool and Chemical Weapons Come Out

This generative model normally penalizes predicted toxicity and rewards predicted target activity. We simply proposed to invert this logic by using the same approach to design molecules de novo, but now guiding the model to reward both toxicity and bioactivity instead.

In less than 6 hours after starting on our in-house server, our model generated 40,000 molecules that scored within our desired threshold. In the process, the AI designed not only VX, but also many other known chemical warfare agents that we identified through visual confirmation with structures in public chemistry databases. Many new molecules were also designed that looked equally plausible.

— Urbina, F., Lentzos, F., Invernizzi, C. et al. Dual use of artificial-intelligence-powered drug discovery . Nat Mach Intell 4, 189–191 (2022). https://doi.org/10.1038/s42256-022-00465-9

By changing the parameters of the machine learning model, the output of the model changed dramatically. The model is trained to look for promising compounds that could be turned into pharmaceuticals. As part of that process, the model tests for toxicity and eliminates those that would likely be harmful to humans. Rather than eliminating those, what if the model preferred toxic compounds? You get a known chemical warfare agent and what looks like many more compounds that could be turned into chemical agents.

In a later commentary published through the American Association for the Advancement of Science (AAAS), a researcher said: “Now, keep in mind that we can’t deliberately design our way to drugs so easily, so we won’t be able to design horrible compounds in one shot, either. Just as there are considerations in drug discovery that narrow down these sorts of lead-generation efforts, there are such factors for chemical weapons: stability on storage, volatility (or lack of it), persistence in the environment, manufacturing concerns, etc.”

Also of note, the human-in-the-loop was a critical breakpoint between the model’s findings as concepts and the physical instantiation of the model’s conclusions. As the journal article goes on to say, unwanted outcomes can come from both taking the human out of the loop and replacing the human in the loop with someone with a different moral or ethical driver.

So it may be of some comfort that there is more between the machine learning model and a weapon. But even with those extra steps, how is something like this regulated? Will working with machine learning algorithms become the type of job requiring a psychological evaluation? Would that even matter with open source tools and open datasets? The tool is neither good nor evil; it is in how the tool is used or misused.

With Machine Learning, Garbage In/Garbage Out

Machine learning (ML), systems, especially deep neural networks, can find subtle patterns in large datasets that give them powerful capabilities in image classification, speech recognition, natural-language processing, and other tasks. Despite this power—or rather because of it—these systems can be led astray by hidden regularities in the datasets used to train them.
Trouble at the Source : Errors and biases in artificial intelligence systems often reflect the data used to train them, Communications of the ACM, December 2021

It isn’t going unnoticed in the computing profession that there need to be ways to quantify problems with machine learning models. You’ve probably read the stories of how facial recognition models trained with picture datasets consisting of primarily white male faces had difficulty zeroing in on anyone who wasn’t a white male. This article describes the difficulties in recognizing biases in training data and quantifying accuracy measurements.

Five Realities Why Applying Machine Learning to Medical Records is Hard

A few years ago, I worked on a project to investigate the potential of machine learning to transform healthcare through modeling electronic medical records. I walked away deeply disillusioned with the whole field and I really don’t think that the field needs machine learning right now. What it does need is plenty of IT support. But even that’s not enough. Here are some of the structural reasons why I don’t think deep learning models on EMRs are going to be useful any time soon.
Deep Learning on Electronic Medical Records is doomed to fail , Brian Kihoon Lee’s blog, 22-Mar-2022

This article describes the difficulties of using machine learning algorithms to synthesize knowledge from medical records. It is also an indictment of the extent to which the requirements of insurance companies (and the subsequent actions by medical providers to subvert the requirements) have mucked up the practice of medicine.

Spring in the Northern Hemisphere Makes Cats Happy

Photograph of a black cat and a white and grey cat on the lawn with harnesses on.

It is warming up—70°F/21°C—earlier this week, and that means the cats want to go outside. We don’t let them wander the neighborhood by themselves for their own protection. Each has a harness and about 50 feet (15 meters) of cord for them to roam the backyard. It is funny how just a little bit of sun can cheer up a cat.

On the other hand, it has turned cool and rainy the remainder of the week, so the cat minder (me) is not all that interested in going outside. Once they have had the taste of the outdoors, it becomes tough to put up with their constant meowing and pawing at the glass. Just a little bit longer, Mittens and Alan…just a little bit longer.