OpenAI Seeks Mathematical Definition of Goodness

Scott Aaronson, an  engineer at OpenAI, reported that Ilya Sutskever, Chief Scientist of OpenAI, would like to reduce ethics, and even the definition of goodness, to an algorithm.

I have these weekly calls with Ilya Sutskever, cofounder and chief scientist at OpenAI. Extremely interesting guy. But when I tell him about the concrete projects that I’m working on, or want to work on, he usually says, “that’s great Scott, you should keep working on that, but what I really want to know is, what is the mathematical definition of goodness? What’s the complexity-theoretic formalization of an AI loving humanity?” And I’m like, I’ll keep thinking about that! But of course it’s hard to make progress on those enormities.

A different idea, which some people might consider more promising, is well, if we can’t make explicit what all of our human values are, then why not just treat that as yet another machine learning problem? Like, feed the AI all of the world’s children’s stories and literature and fables and even Saturday-morning cartoons, all of our examples of what we think is good and evil, then we tell it, go do your neural net thing and generalize from these examples as far as you can.

One objection that many people raise is, how do we know that our current values are the right ones? Like, it would’ve been terrible to train the AI on consensus human values of the year 1700—slavery is fine and so forth. The past is full of stuff that we now look back upon with horror.

So, one idea that people have had—this is actually Yudkowsky’s term—is “Coherent Extrapolated Volition.” This basically means that you’d tell the AI: “I’ve given you all this training data about human morality in the year 2022. Now simulate the humans being in a discussion seminar for 10,000 years, trying to refine all of their moral intuitions, and whatever you predict they’d end up with, those should be your values right now.”

The idea that we can improve on ethics after making it quantifiable is a popular theme among AI-researchers, even those who do not hold to the Coherent Extrapolated Volition (CEV) model. For example, in his book Superintelligence, Nick Bostrom argues that CEV is too closely aligned to human ideas of ethics, which are not always correct. Better to let the computer figure out morality for us, and thus improve on human ethics.

The idea is that we humans have an imperfect understanding of what is right and wrong, and perhaps an even poorer understanding of how the concept of moral rightness is to be philosophically analyzed: but a superintelligence could understand these things better…. To the extent that moral assertions are “truth-apt” (i.e. have an underlying propositional character that enables them to be true or false), the superintelligence should be able to figure out which assertions of the form, “Agent X ought now to Φ”” are true. At least, it should outperform us on this task.” 266-7 & 371

While Bostrom recognizes there are problems to this, they are fundamentally programming problems, because ethics is fundamentally concerned with truth claims about actions.

This, of course, is not how ethics has been classically understood. In the older tradition of ethical inquiry (whether among the ancient Hebrews, Chinese, Greeks, or early Christians), ethics was primarily about persons, and only secondarily about actions and truth-claims about actions. Among the fundamental questions of moral philosophy were, “What kind of person do I want to become?” and “What does a flourishing human life look like?” When applied to individuals in the aggregate, this line of inquiry constituted the central political question: “What character traits in citizens will foster flourishing in the city?”

Within the ancient world, the answers to these questions always involved some account of the virtues: qualities like kindness, wisdom, courage, honesty, integrity, temperance, patience, etc. These virtues, of course, require embodiment – they require persons. There was a common consensus that through dutiful habits, educational formation, and proper spiritual practices, we develop the character traits necessary to become well-ordered and mature human beings. While this older tradition did concern itself with the rightness or wrongness of actions (what we would call moral duties), such concerns were situated in the larger context of a worthwhile human life (what Aristotle called eudaimonia), and consequently, the virtues.

See Also

In Daniel C. Russell’s introduction to the 2013 Cambridge Companion to Virtue Ethics, he points out that an advantage to this older tradition was its focus on the whole of a human life, and not merely individual actions considered in isolation.

“What sets virtue ethics apart is that it treats ethics as concerned with one’s whole life – and not just those occasions when something with a distinctly “moral” quality is at stake. For virtue ethics, the focus is not so much on what to do in morally difficult cases as on how to approach all of one’s choices with such personal qualities as kindness, courage, wisdom, and integrity. That difference in focus is an important one. People who may feel confident in the rightness of their actions can sometimes be brought up short when asked whether they are also being generous, or considerate, or honesty. Rightness is about what we’re doing; virtue is also about how we’re living. It resists compartmentalization.”

Somewhere in the modern age we got off track. Some date the problem back to Immanuel Kant, others to Hume, while still others lay the blame at the 14th century Franciscan friar, William of Ockham. But regardless of how you tell the story of intellectual history, everyone agrees that by the 19th century, ethics had ceased to be primarily concerned with becoming a certain sort of person but analyzing actions stripped of all relation to virtuous personhood. We began to ask “what is the best thing to do?” without first grappling with the prior question, “What is the best way to live?” and “What type of person should I want to become?”

The receding role of virtue in moral debates is most obvious in the type of top-down managerial ethical-style of Jeremy Bentham’s utilitarianism, or the social Darwinism of men like Francis Galton and Margaret Sanger. Yet even in religious communities where we would expect to find a robust tradition of virtue-based reasoning, ethical discourse is often tinged with a nominalism that, in the final analysis, can only conceive ethical behavior in terms of an action’s relationship to divine command.

This creates the situation today where we can even have conversations about computers – who by definition can never practice virtues like courage, temperance, kindness, etc. –  one day outperforming humans on ethics.

Scroll To Top