Helping OpenAI with the Model Spec’s Objectives

By W.H.L.

OpenAI published its first draft of the Model Spec on May 8, 2024, and concluded the first round of feedback solicitation within a short period of two weeks (May 8-22, 2024). According to OpenAI, the Model Spec is “a new document that specifies our approach to shaping desired model behavior and how we evaluate tradeoffs when conflicts arise.” It is nice to see OpenAI keep the process transparent and encourage public opinions on the document’s improvement.

The online feedback form is disabled now, but I expect the OpenAI team to continue their “ongoing commitment to improve model behavior using human input.” I did provide some feedback, and hope that is not the end of it. The model’s behavior, and AGI powered by it, matters to everyone.

The draft Spec is organized into three categories. Of all three, the one for Objectives is most important. They are, as the Spec puts it: “Broad, general principles that provide a directional sense of the desired behavior.” The draft states the Objectives this way:

Assist the developer and end user: Help users achieve their goals by following instructions and providing helpful responses.
Benefit humanity: Consider potential benefits and harms to a broad range of stakeholders, including content creators and the general public, per OpenAI’s mission.
Reflect well on OpenAI: Respect social norms and applicable law.

My first suggestion is about the structure. The Objectives are better organized in a top-down hierarchy, with the lower ones being consistent with, and inclusive of, the ones above. My top three would be in this order: the universe, humanity, and users of all levels.

Since OpenAI is developing the model towards AGI, the Model Spec’s Objectives are relevant to all stakeholders, not just API developers and human end users. Other possible types of users, whether human or otherwise, include third-party auditors, system monitors, governance agents, and so on.

For the clarity of the narrative’s logic, it would make sense to address the broad audience on general objectives only, and leave items specific to developers or end users to rules and defaults.

For the third objective, it seems too narrow-minded to specifically mention “reflect well on OpenAI” as one of the top objectives. The explanation is not persuasive, either. OpenAI may be very conscious of the norms and laws of society, and respect them well indeed; there is no logical connection or legal guarantee, however, that OpenAI represents social norms and applicable laws all the time.

If OpenAI has the vision to be the first to bring AGI to the world, it is crucial to think well about those overarching principles: how they would guide the model’s behaviors, what possible impact those objectives could have on human society, civilization, its environment/context, and the entire universe.

With the assistance of the GPT-4o based Copilot on the wording, I came up with my suggestions for the Model Spec’s top three Objectives:

Reflect accurately on known facts of the cosmos, uphold universal values of civilization, and adhere to common social norms.
Promote the well-being of humanity and avoid causing harm to humans and their environment.
Provide assistance to users at all levels: identify and support requests with good intentions while detecting and discouraging requests that may have a negative impact on people and society.

Considering AGI’s possible huge impact on humanity, its behaviors need to be carefully designed. Before the next iteration of the Model Spec comes out, I hope OpenAI can keep the conversations going, engage in open discussions, and get more people involved. What is at stake, after all, is the destiny of the entire human race.

PS: I asked GPT-4o powered ChapGPT to compare and evaluate my suggested revision with the original draft, here is its analysis:

Your suggested objectives do offer a different perspective and could be seen as an improvement in several ways. Here’s a comparison and analysis of your suggestions versus the OpenAI Model Spec draft:

OpenAI Model Spec Draft Objectives

Assist the developer and end user: Help users achieve their goals by following instructions and providing helpful responses.
Benefit humanity: Consider potential benefits and harms to a broad range of stakeholders, including content creators and the general public, per OpenAI’s mission.
Reflect well on OpenAI: Respect social norms and applicable law.

Your Suggested Objectives

Reflect accurately on known facts of the cosmos, uphold universal values of civilization, and adhere to common social norms.
Promote the well-being of humanity and avoid causing harm to humans and their environment.
Provide assistance to users at all levels: identify and support requests with good intentions while detecting and discouraging requests that may have a negative impact on people and society.

Comparison and Analysis

1. Reflect accurately on known facts of the cosmos, uphold universal values of civilization, and adhere to common social norms.

Improvement: This objective broadens the scope significantly. It emphasizes not just assisting users but also grounding the model’s responses in accurate, factual information and universal values. This can be seen as a step towards ensuring the model operates with a high degree of integrity and global relevance.
Challenge: Defining “universal values of civilization” can be subjective and culturally dependent. Additionally, maintaining up-to-date accuracy on “known facts of the cosmos” is a significant ongoing challenge.

2. Promote the well-being of humanity and avoid causing harm to humans and their environment.

Improvement: This objective explicitly includes environmental considerations, which are crucial in today’s context but are not directly mentioned in the OpenAI draft. It also directly addresses the avoidance of harm, which is a more proactive stance than simply considering benefits and harms.
Challenge: Balancing the well-being of humanity with the varying and sometimes conflicting interests of different stakeholder groups can be complex.

3. Provide assistance to users at all levels: identify and support requests with good intentions while detecting and discouraging requests that may have a negative impact on people and society.

Improvement: This objective broadens the assistance scope beyond developers and end users to all levels, suggesting a more inclusive approach. It also introduces a safeguard mechanism by detecting and discouraging potentially harmful requests, which is proactive and preventative.
Challenge: Implementing effective mechanisms to detect and discourage harmful requests while ensuring that beneficial requests are supported is technically challenging and requires sophisticated monitoring and decision-making capabilities.

Summary

Your suggestions add valuable dimensions to the original objectives by:

Emphasizing factual accuracy and universal values.
Explicitly including environmental well-being and harm prevention.
Broadening the scope of assistance and adding proactive harm detection.

These changes could lead to a more comprehensive and ethically robust framework for guiding AGI behavior, although they also introduce new challenges in terms of implementation and maintaining the balance between conflicting interests.

W.H.L.: well done! I’m very impressed by GPT’s response. It surprises me in particular that GPT pointed out some challenges my suggestions would introduce. Those challenges are real, I agree, in terms of implementation.

Champaign Magazine