Programs to detect AI discriminate against non-native English speakers, shows study

Publication Source

Computer programs that are used to detect essays, job applications and other work generated by artificial intelligence can discriminate against people who are non-native English speakers, researchers say.

Tests on seven popular AI text detectors found that articles written by people who did not speak English as a first language were often wrongly flagged as AI-generated, a bias that could have a serious impact on students, academics and job applicants.

With the rise of ChatGPT, a generative AI program that can write essays, solve problems and create computer code, many teachers now consider AI detection as a “critical countermeasure to deter a 21st-century form of cheating”, the researchers say, but they warn that the 99% accuracy claimed by some detectors is “misleading at best.”

Scientists led by James Zou, an assistant professor of biomedical data science at Stanford University, ran 91 English essays written by non-native English speakers through seven popular GPT detectors to see how well the programs performed.

More than half of the essays, which were written for a widely recognised English proficiency test known as the Test of English as a Foreign Language, or TOEFL, were flagged as AI-generated, with one program flagging 98% of the essays as composed by AI. When essays written by native English-speaking eighth graders in the US were run through the programs, the same AI detectors classed more than 90% as human-generated.