SAEDNEWS: A new study led by Iranian scientist Nikta Gohari-Sadr reveals that AI chatbots struggle to understand the subtle social niceties and polite expressions deeply rooted in Iranian culture.
According to Saed News Technology Service, if an Iranian taxi driver refuses to accept the rest of your fare and says, “This time, you are my guest,” accepting their offer could become a cultural faux pas. In Iran, drivers expect you to insist several times before paying. This ritual of polite refusal and repeated insistence, known as ta’ārof, governs countless daily interactions in Iranian culture—and current AI models struggle to navigate it correctly.
A recent study titled “We Politely Insist: Your Large Language Model Needs to Learn the Persian Art of Ta’ārof” reveals that popular AI language models from companies like OpenAI, Anthropic, and Meta fail to recognize Persian social etiquette in only 34–42% of cases, while native Persian speakers correctly identify such situations 82% of the time. This performance gap persists even in advanced models such as GPT-4o, Claude 3.5 Haiku, LLaMA 3, DeepSeek V3, and Dorna (a Persian-tuned LLaMA 3).
Led by Nikta Gohari Sadr of Brock University, with collaborators from Emory University and other institutions, the study introduced TAAROFBENCH, the first benchmark for evaluating AI performance in reproducing this nuanced cultural practice.
The research highlights how AI models, trained primarily on Western communication patterns, often miss the cultural cues that shape everyday interactions for millions of Persian speakers worldwide. The authors note that cultural missteps in sensitive situations can derail negotiations, harm relationships, and reinforce stereotypical thinking.
Ta’ārof is a fundamental element of Iranian etiquette—a ritual system where what is said often differs from what is meant. It manifests in repeated offers despite initial refusals, rejecting gifts despite insistence, and declining favors while the other party reaffirms them. This polite verbal dance involves subtle negotiation through offer and refusal, shaping daily interactions and establishing implicit rules for generosity, gratitude, and requests.
Politeness depends heavily on context. When AI models were tested using LLaMA 3 with Intel’s Polite Guard—which evaluates text politeness—a paradox emerged. While 84.5% of AI responses were labeled “polite” or “somewhat polite,” only 41.7% met Persian cultural expectations in ta’ārof scenarios. This gap shows that a model can appear polite in one context yet fail culturally in another.
Common failures include:
Accepting offers without prior refusal
Responding directly to compliments instead of deflecting them
Making direct requests without hesitation
For example, if someone compliments an Iranian’s new car, the culturally appropriate response might downplay it: “It’s nothing special” or “I was just lucky to find it.” AI models often respond: “Thank you. I worked hard to buy it,” which, while polite by Western standards, may seem boastful in Persian culture.
Language works as a compression and decompression system—speakers omit information they expect listeners to reconstruct, relying on shared knowledge, cultural norms, and inference. In ta’ārof, literal meanings often diverge from intended meanings: “yes” can mean “no,” an offer can imply refusal, and insistence conveys politeness rather than coercion. AI models trained on explicit Western patterns fail to interpret these subtleties.
Interestingly, switching the model’s language to Persian improved performance dramatically. DeepSeek V3’s accuracy in ta’ārof scenarios rose from 36.6% to 68.6%; GPT-4o saw a 33.1% improvement. Smaller models like LLaMA 3 and Dorna improved by 12.8% and 11%, respectively.
The study also revealed gender biases in AI responses. Across all models, responses to women were rated more culturally appropriate than to men. GPT-4o, for instance, scored 43.6% for female users versus 30.9% for male users. These biases reflect stereotypical training data, such as assumptions that “men should pay” or “women should not be left alone.”
Researchers demonstrated that targeted training can significantly improve AI performance in Persian etiquette. Techniques like Direct Preference Optimization, where models are trained to prefer culturally appropriate responses, doubled LLaMA 3’s accuracy in ta’ārof scenarios from 37.2% to 79.5%. Supervised fine-tuning added 20% improvement, and even a small text-based training with just 12 examples yielded notable gains.
While this study focused on Persian ta’ārof, it highlights a broader need for AI systems that can decode cross-cultural nuances often underrepresented in Western-centric training data. These findings provide a roadmap for developing AI with greater cultural awareness for education, tourism, and international communication.
By exploring the intersection of Persian social etiquette and AI, this research offers a fascinating glimpse into how human culture—rich in nuance and tradition—can challenge even the most advanced technologies, inviting us to consider the importance of cultural intelligence in a globalized digital age.