The fact that OpenAI ultimately prevailed in this game shows that the model alignment to human labeling through supervised and reinforcement learning is critical for human-machine interaction.
ChatGPT is an artificial intelligence (AI) system that can perform sophisticated writing and dialogs after learning from vast amounts of linguistic data. The success of ChatGPT is phenomenal. AI-based human-machine language interaction has been at the center of AI competition in recent years. The major players in this game have been Google, Meta, and OpenAI. Google was in the best position from the outset, given its invention of Transformer (the cornerstone of all cutting-edge language models) and its significant edge in reinforcement learning. Yet, Google’s efforts in this area were rather diffusing. It kept generating language model variants with incremental innovations but failed to reach the next level. Meta has a strong AI team, including many top AI researchers in the world. Nevertheless, their faith in self-supervised learning to solve human-machine interaction did not deliver high-impact success. Conversely, OpenAI, with a small team, stayed focused on a single product line (GPT, including its latest release of GPT-4). It moved in the right direction of using human input to “align” the language model based on the Reinforcement Learning from Human Feedback (RLHF) approach. The fact that OpenAI ultimately prevailed in this game shows that the model alignment to human labeling through supervised and reinforcement learning is critical for human-machine interaction. However, a chatbot’s actions rely heavily on cues (prompts) provided by human operators. To properly utilize ChatGPT’s capabilities, prompts to instruct or mentor the chatbot must be carefully designed to get valuable, valid, and robust responses. This process becomes another “alignment” problem of using prompt engineering to best probe ChatGPT’s knowledge graph for best serving users’ needs.