The Effect of Microsoft Copilot in a Multi-lingual Context with Donald Ngwe

We tested Microsoft Copilot in multilingual contexts, examining how Copilot can facilitate collaboration between colleagues with different native languages.

First, we asked 77 native Japanese speakers to review a meeting recorded in English. Half the participants had to watch and listen to the video. The other half could use Copilot Meeting Recap, which gave them an AI meeting summary as well as a chatbot to answer questions about the meeting.

Then, we asked 83 other native Japanese speakers to review a similar meeting, following the same script, but this time held in Japanese by native Japanese speakers. Again, half of participants had access to Copilot.

For the meeting in English, participants with Copilot answered 16.4% more multiple-choice questions about the meeting correctly, and they were more than twice as likely to get a perfect score.  Moreover, in comparing accuracy between the two scenarios, people listening to a meeting in English with Copilot achieved 97.5% accuracy, slightly more accurate than people listening to a meeting in their native Japanese using standard tools (94.8%). This is a statistically significant difference (p<.05). The changes are small in percentage point terms because the baseline accuracy is so high, but Copilot closed 38.5% of the gap to perfect accuracy for those working in their native language (p<0.10) and closed 84.6% of the gap for those working in (non-native) English (p<.05).

 

Summary from Jaffe et al, Generative AI in Real-World Workplaces, July 2024.

Impact of M365 Copilot on Legal Work at Microsoft

Teams at Microsoft often reflect on how Copilot helps.  I try to help these teams both by measuring Copilot usage in the field (as they do their ordinary work) and in lab experiments (idealized versions of their tasks in environments where I can better isolate cause and effect).  This month I ran an experiment with CELA, Microsoft’s in-house legal department.  Hossein Nowbar, Chief Legal Officer and Corporate Vice President, summarized the findings in a post at LinkedIn:

Recently, we ran a controlled experiment with Microsoft’s Office of the Chief Economist, and the results are groundbreaking. In this experiment, we asked legal professional volunteers on our team to complete three realistic legal tasks and randomly granted Copilot to some participants. Individuals with Copilot completed the tasks 32% faster and with 20.3% greater accuracy!

Copilot isn’t just a tool; it’s a game-changer, empowering our team to focus on what truly matters by enhancing productivity, elevating work quality, and, most importantly, reclaiming time.

All findings statistically significant at P<0.05.

Full results.