An Adviser to Elon Musk’s xAI Has a Way to Make AI More Like Donald Trump | EUROtoday

On Feb 12, 2025

Get real time updates directly on you device, subscribe now.

A researcher affiliated with Elon Musk’s startup xAI has discovered a brand new approach to each measure and manipulate entrenched preferences and values expressed by synthetic intelligence fashions—together with their political opinions.

The work was led by Dan Hendrycks, director of the nonprofit Center for AI Safety and an adviser to xAI. He means that the approach could possibly be used to make common AI fashions higher mirror the desire of the voters. “Maybe in the future, [a model] could be aligned to the specific user,” Hendrycks advised WIRED. But within the meantime, he says, a superb default can be utilizing election outcomes to steer the views of AI fashions. He’s not saying a mannequin ought to essentially be “Trump all the way,” however he argues it needs to be biased towards Trump barely, “because he won the popular vote.”

xAI issued a brand new AI threat framework on February 10 stating that Hendrycks’ utility engineering strategy could possibly be used to evaluate Grok.

Hendrycks led a crew from the Center for AI Safety, UC Berkeley, and the University of Pennsylvania that analyzed AI fashions utilizing a way borrowed from economics to measure customers’ preferences for various items. By testing fashions throughout a variety of hypothetical eventualities, the researchers had been in a position to calculate what’s referred to as a utility operate, a measure of the satisfaction that individuals derive from a superb or service. This allowed them to measure the preferences expressed by completely different AI fashions. The researchers decided that they had been usually constant quite than haphazard, and confirmed that these preferences turn into extra ingrained as fashions get bigger and extra highly effective.

Some analysis research have discovered that AI instruments reminiscent of ChatGPT are biased in the direction of views expressed by pro-environmental, left-leaning, and libertarian ideologies. In February 2024, Google confronted criticism from Musk and others after its Gemini software was discovered to be predisposed to generate pictures that critics branded as “woke,” such as Black vikings and Nazis.

The technique developed by Hendrycks and his collaborators offers a new way to determine how AI models’ perspectives may differ from its users. Eventually, some experts hypothesize, this kind of divergence could become potentially dangerous for very clever and capable models. The researchers show in their study, for instance, that certain models consistently value the existence of AI above that of certain nonhuman animals. The researchers say they also found that models seem to value some people over others, raising its own ethical questions.

Some researchers, Hendrycks included, believe that current methods for aligning models, such as manipulating and blocking their outputs, may not be sufficient if unwanted goals lurk under the surface within the model itself. “We’re gonna have to confront this,” Hendrycks says. “You can’t pretend it’s not there.”

Dylan Hadfield-Menell, a professor at MIT who researches strategies for aligning AI with human values, says Hendrycks’ paper suggests a promising path for AI analysis. “They find some interesting results,” he says. “The main one that stands out is that as the model scale increases, utility representations get more complete and coherent.”

https://www.wired.com/story/xai-make-ai-more-like-trump/