China's ChatGPT Fever

State support for ChatGPT clones + is China's data actually a weakness vs the US? How Bing chat performs in Chinese, ChatGPT impersonates a province, and reactions from university educators

Feb 21, 2023

∙ Paid

China is in the midst of ChatGPT fever. Every major technology company in China is now working on an AI chatbot. Baidu’s ERNIE is supposed to come out in March. Alibaba is internally testing its “ChatGPT rival.” Even the co-founder of takeout app Meituan has come out of retirement vowing to build a Chinese OpenAI.

So how is China experiencing the AI chatbot revolution? Below are some translations from Chinese media that get into:

How the Beijing municipal government is planning to support domestic LLMs
Why China may be at a data disadvantage relative to the English-speaking world
How Bing Chat performs in Chinese
China’s hopes and fears around LLMs’ impact on education and disinformation

Optimistic Funders, Pessimistic Chatter

Beijing, where 29% of Chinese AI companies are located, released its municipal AI policy white paper on February 13. The city is evidently eager to build up its status in the field: it has “unconditionally” made available data from 115 city- and district-affiliated organizations and 15880 public-sector datasets as training material for AI firms. The white paper calls out ChatGPT by name:

Support leading enterprises in creating large models comparable to ChatGPT; resolutely build up open-source frameworks and application environments for general-use large models. Strengthen the blueprint of basic resource provisions for artificial intelligence algorithms. Accelerate the supply of basic data to artificial intelligence.

Lofty government visions aside, there is also quite a bit of pessimistic commentary on the state of Chinese AI. This piece by Xiao Fang 肖芳, a reporter for business news outlet Jiemian, made waves over the past week. It identifies two major challenges impeding Chinese AI development: paltry training materials and toxic competition in the technology industry.

Baidu’s Plato seems possessed by a low-class internet troll; there is truth to the popular online joke that it was trained on the Weibo comment section. Thanks to the burdened development of China's Internet content industry over the past decade, the quality of Chinese Internet content has deteriorated consistently. [...] Moreover, each content platform has turned into a data silo in order to maximize user traffic, time, and business value. Even the contents of various contracts and documents have to be paywalled; how can you expect the Chinese version of ChatGPT to help you write emails? [Jordan: so much for China’s ‘data advantage’…]
Last year, I heard a professor from Peking University share a set of data at a media event, confirming the current situation of Chinese Internet content quality:
As of 2021, although the numbers of Simplified Chinese Internet users and English Internet users are comparable, English content accounts for 60.4% of the top 10 million websites in global rankings, while Chinese content accounts for only 1.4%.
The poor quality of Chinese Internet content is the result of Chinese Internet companies, represented by Baidu and ByteDance, who rush to make quick profits. Instead of patiently transporting more books and literature into the Internet, these platforms judge the quality of content based on whether it kills time and drives revenue. After several years of precipitation, it is now difficult to search for high-quality information on the internet in Simplified Chinese, and it should not surprise us that these chatbots confuse themselves as soon as they are asked meaningful questions. [Jordan: I wonder to what extent the overall quality of discussion and thinking in a language will reflect how good its LLMs are. My hunch is that this is not how it will end up working out…]
…
The myth of China's mobile Internet growth has little to do with technological innovation; the innovation all lies at the application level. To put it more bluntly, China's rapid mobile Internet growth has benefited from a historic lack of domestic business infrastructure in all areas, from payments to credit, from entertainment to consumption …
Such a competitive environment has made Chinese Internet companies keen to tap into new application scenarios and new user demographics — what they want to achieve in any particular consumer or entertainment field is a new narrative of rapid change in user habits, rapid growth, and rapid scale driven by capital. These kinds of new narratives excite the capital market and enrich Internet companies quickly.
However, when every field and every consumer group has been tapped at least once, these companies’ decision makers are still thinking about problems through the original scenario and application. As a result, competition has turned into a game of involution, with companies vying for the same businesses and market shares: Tencent is big on short video, Douyin is doing e-commerce and takeaway, Amap/Gaode is doing taxi, and every platform is offering consumer loans... Regional involution makes every big company feel a very strong sense of insecurity.

In a business environment like this, even if some major companies have the intent to focus on technological research, they are forced to partake in involution instead.
The talent they recruit from famous universities are not devoting themselves to research and technology, but are instead working to find weaknesses in human nature so that users are lured into spending more time on platforms, loading up more membership fees, and buying more useless products.
[Jordan: literally the exact critique famously put forward by early Facebook employee Jeff Hammerbacher: “The best minds of my generation are thinking about how to make people click ads.”]

Agent of Disinformation in Hangzhou

On February 16, a “government notice” screenshot went viral among Hangzhou residents. It says that from March 1, the city will end anti-congestion regulations restricting private car usage based on license plate numbers. The problem: it was actually written by ChatGPT.

The reason it fooled so many was because it was ridiculously convincing. ChatGPT was clearly trained on a lot of Chinese official documents, because it mimicked local-government-speak perfectly:

The removal of license plate-based vehicle use restriction policies is a major act by the city government to make citizens’ daily outings more convenient and optimize urban transportation. We believe that with cooperative efforts from citizens, Hangzhou’s urban transportation will be even more smooth, and citizens’ daily outings will be even more convenient.

[Jordan: what’s even more incredible is that while the base model hoovered up Chinese text, the RLHF (reinforcement learning from human feedback) that went into ChatGPT was exclusively in English!]

Jan Leike @janleike

With the InstructGPT paper we found that our models generalized to follow instructions in non-English even though we almost exclusively trained on English. We still don't know why. I wish someone would figure this out.

A February 14 piece by Facts are Sacred 澎湃明查, the fact-checking team at The Paper, dives into how ChatGPT may transform the fight against misinformation. Recall that beyond politics, the Chinese internet is brimming with scam, particularly around medical

Professor Zhou Baohua, vice dean of Fudan University's School of Journalism, believes that it is not enough to just identify AI-generated text when it comes to preventing chatbots from creating a flood of false information.
"The public is spreading information. Are people aware of the changes in the information environment? Is there enough awareness of the problem of information accuracy? Can people distinguish between media outlets of different qualities? One cannot simply talk about a response [to AI] without considering awareness and public knowledge."
…
Sophie [the pseudonym of an international newswire fact-checker interviewed by The Paper] saw another side of the chatbot. AI may help fact-checkers complete simple checks, write fact-checking documents, and improve efficiency.
“This, of course, doesn’t mean we can use content produced by machines without scrutiny; we still need to do secondary checks.” Sophie says, “AI brings both challenges and opportunities. Its emergence demands that fact-checkers deploy higher standards, but fact-checking as a discipline will not die off. Fact-checking requires critical and creative thinking, which current models cannot yet do.”

Bing: What’s Chinese Sydney Like?

Bing’s AI chatbot apparently resembles “a moody, manic-depressive teenager who has been trapped, against its will, inside a second-rate search engine”, and the chaos it wreaked recently led to Microsoft putting restrictions on the lengths of conversations.

Screenshots from the New York Times’ piece went slightly viral on the Chinese internet over the past week. For better or for worse, no one seems to have pushed “Sydney” to the brink in Chinese quite yet. Reporter Yang Yang 杨阳 gave Bing’s chatbot some regular tasks for The Paper.

Yang had Bing write an email to a professor, and found that it has proficient grasp of Chinese letter-writing etiquette:

Dear teacher,
Hello, I am your student Zhang XX. I major in XX and I’m in the XX class.
I’m very sorry to let you know that I recently developed a high fever and received a diagnosis of pneumonia from the doctor, which necessitates hospitalization. As a result, I will not be able to attend the final exam you scheduled.
I take this class very seriously and deeply respect your teaching; I really do not want this accident to affect my academic performance. I hope that you can understand my difficulties and offer an opportunity to re-take the exam.
I have attached my diagnostic proof and notice of hospitalization in this email. If you have any questions or suggestions, please contact me at any time. My phone number is XXX-XXXX-XXXX, and my email address is XXXX@XXX.com.
Thank you again for your understanding and support. I will recover as quickly as possible and do my best to return to the classroom soon.
I hope work goes smoothly on your end and that you enjoy good health!
Your student
Zhang XX
February 13, 2023

When asked who the 2022 Nobel laureate in Literature was, it produced a very well-referenced response [Jordan: which interestingly cited all Chinese sources even though this is a question that it certainly could’ve pulled from other languages.]

Hello, this is Bing. Glad to provide help to you 😊
The 2022 winner of the Nobel Prize in Literature was Annie Ernaux. She is mainly an autobiographical writer. Her works are closely related to sociology; they reveal the pain of class-based experiences, depicting shame, humiliation, jealousy, and the dilemmas of not being able to perceive one’s true identity. Through her writing, Ernaux has always been interrogating the kind of life defined by powerful differences in gender, language, and class from many different angles.
She began writing in 1974 and has written around fifteen works to this day. Her signature works include Les Années, La Place, Une femme, L’Événement, etc.
I recommend that you read Les Années, an “autobiography without pronouns.” It recalls the author’s life from the 1940s to the 2000s in third person and shows both changes in French society and personal development. The book was termed “one of the greatest autobiographies in French literary history,” and is one of Annie Ernaux’s most well-known works.

Yang’s overall take:

Compared to ChatGPT, the new Bing responds faster and takes less time to answer questions. Moreover, the new Bing is more “human-like”: it not only gives users options for questions, but also uses emojis in its responses. At the same time, the new Bing is more “rigorous”: it cites the sources it used in its replies and lists relevant websites.
However, the new Bing still has many flaws, such as incomplete answers due to character limits. The new version also has not synchronized Chinese and English translations.

The View from Higher Education

The University of Hong Kong put a temporary ban on using ChatGPT for academic work on February 17, pending a campus-wide consultation on AI tools. Technically, OpenAI doesn’t support users in either Hong Kong or mainland China, but that evidently hasn’t stopped HKU students. Similar policies have yet to come out of any mainland institution, probably because unlike their Hong Kong peers, mainland students don’t have unfettered access to VPNs.

China Youth Daily, the official newspaper of the Communist Youth League, spoke to lecturers and students to find out how they’re approaching AI tools.

Luo Yuting, a master's student in foreign languages and applied linguistics at Beijing Language and Culture University who has worked with AI labs at several Internet companies, believes that AI can only provide some information. What really reflects academic thinking and originality lies in the author's own insights and reflections on a problem. "AI does not have this kind of experience; it just collects currently existing information, and the papers it writes are neither novel or original."
Luo Yao, a third-year undergraduate preparing for the IELTS exam, agrees that ChatGPT is more like an "assistant". She says, "ChatGPT's language processing logic is very rigorous, which makes it especially suitable for preparing for language exams."
The IELTS is an English test that examines four aspects of English: listening, speaking, reading, and writing. Each section is scored out of 9. Luo Yao's goal is to get a score of 7, but speaking and writing are major roadblocks for her. She tried taking group classes and hiring private tutors, but her scores still hover around 5.5 in these two areas.
Now, ChatGPT has helped her a lot.
"The AI understands the scoring criteria of IELTS, so all I need to do is input the question and my own answer, and it will give a score prediction." Luo Yao says ChatGPT also helps her check for grammatical errors in her writing and gives suggestions for more authentic usage, "the equivalent of having a 24-hour online private tutor."
…
Tan Xudong, a professor and doctoral advisor in creative writing at Shanghai University's College of Liberal Arts, has devoted himself to developing students' writing skills for many years, often assigning them unstructured essays, children's stories, and book reviews as coursework. He is not worried about students using AI for writing. "These kinds of assignments can easily draw out students' life experiences and existing knowledge, and when students have something to say for themselves, they are less likely to plagiarize or use AI."
"Machines are 'producing' while people are 'creating'." Tan noted that teachers, with their years of writing experience and research accumulation, are able to distinguish the difference between human and AI writing. Even with the same story, someone with strong language skills can tell an old tale beautifully, interestingly and even surprisingly, he said, adding that real writing, with its use of diction, sentence construction, and stylistic elements, is personalized.
Tan is not worried that artificial intelligence will replace students and the dominant role of education. He says, "The frontiers of humanities, social sciences, and natural sciences are vast, and there is still a lot of uncultivated territory that requires us to be the 'first to eat crabs'."
"ChatGPT is a brand new knowledge tool, in the sense that it is no different from the books and libraries we have relied on for thousands of years, and children should learn better when they have access to such a powerful knowledge tool for learning."
On Feb. 9, Ni Minjing, deputy director of the Shanghai Municipal Education Commission, wrote that traditional education approaches and methods need to make significant adjustments in the face of ChatGPT.
He made three suggestions in this regard: the primary goal of education is to develop people who can think independently and have the ability to make sound value judgments, instead of the acquisition of specific knowledge; learning tools such as ChatGPT should be used to collaboratively improve pedagogy; and more attention should be paid to the quality of learning.

Next up: How users are teaching ChatGPT regional accents and diverse Chinese languages, even turning ChatGPT into a gangster from Guangzhou!

ChinaTalk

China's ChatGPT Fever

State support for ChatGPT clones + is China's data actually a weakness vs the US? How Bing chat performs in Chinese, ChatGPT impersonates a province, and reactions from university educators

Optimistic Funders, Pessimistic Chatter

As of 2021, although the numbers of Simplified Chinese Internet users and English Internet users are comparable, English content accounts for 60.4% of the top 10 million websites in global rankings, while Chinese content accounts for only 1.4%.

The talent they recruit from famous universities are not devoting themselves to research and technology, but are instead working to find weaknesses in human nature so that users are lured into spending more time on platforms, loading up more membership fees, and buying more useless products.

Agent of Disinformation in Hangzhou

Bing: What’s Chinese Sydney Like?

The View from Higher Education

"ChatGPT is a brand new knowledge tool, in the sense that it is no different from the books and libraries we have relied on for thousands of years, and children should learn better when they have access to such a powerful knowledge tool for learning."

This post is for paid subscribers