Ben Edelman https://www.benedelman.org Tue, 24 Sep 2024 21:27:33 +0000 en-US hourly 1 https://wordpress.org/?v=6.6.2 https://www.benedelman.org/wp-content/uploads/cropped-magnifying-32x32.png Ben Edelman https://www.benedelman.org 32 32 American Airlines – “first checked bag free” credit card complaint https://www.benedelman.org/aa-first-checked-bag-free-cc/ Tue, 24 Sep 2024 13:00:13 +0000 https://www.benedelman.org/?p=2129 Complaint.

Status: Briefing underway.

Summary: In prominent marketing offers, including onboard napkins and large airport displays, AA promises “first checked bag free” if customers get certain AA-partner credit cards.  But AA denies that benefit on itineraries that are completely or partially international — a restriction nowhere mentioned in initial marketing offers.

]]>
The Effect of Microsoft Copilot in a Multi-lingual Context https://www.benedelman.org/the-effect-of-copilot-in-a-multi-lingual-context/ Thu, 01 Aug 2024 04:32:36 +0000 https://www.benedelman.org/?p=2099 Continue reading "The Effect of Microsoft Copilot in a Multi-lingual Context"

]]>
We tested Microsoft Copilot in multilingual contexts, examining how Copilot can facilitate collaboration between colleagues with different native languages.

First, we asked 77 native Japanese speakers to review a meeting recorded in English. Half the participants had to watch and listen to the video. The other half could use Copilot Meeting Recap, which gave them an AI meeting summary as well as a chatbot to answer questions about the meeting.

Then, we asked 83 other native Japanese speakers to review a similar meeting, following the same script, but this time held in Japanese by native Japanese speakers. Again, half of participants had access to Copilot.

For the meeting in English, participants with Copilot answered 16.4% more multiple-choice questions about the meeting correctly, and they were more than twice as likely to get a perfect score.  Moreover, in comparing accuracy between the two scenarios, people listening to a meeting in English with Copilot achieved 97.5% accuracy, slightly more accurate than people listening to a meeting in their native Japanese using standard tools (94.8%). This is a statistically significant difference (p<.05). The changes are small in percentage point terms because the baseline accuracy is so high, but Copilot closed 38.5% of the gap to perfect accuracy for those working in their native language (p<0.10) and closed 84.6% of the gap for those working in (non-native) English (p<.05).

 

Summary from Jaffe et al, Generative AI in Real-World Workplaces, July 2024.

]]>
Impact of M365 Copilot on Legal Work at Microsoft https://www.benedelman.org/impact-of-m365-copilot-on-legal/ Fri, 24 May 2024 13:00:41 +0000 https://www.benedelman.org/?p=2093 Continue reading "Impact of M365 Copilot on Legal Work at Microsoft"

]]>
Teams at Microsoft often reflect on how Copilot helps.  I try to help these teams both by measuring Copilot usage in the field (as they do their ordinary work) and in lab experiments (idealized versions of their tasks in environments where I can better isolate cause and effect).  This month I ran an experiment with CELA, Microsoft’s in-house legal department.  Hossein Nowbar, Chief Legal Officer and Corporate Vice President, summarized the findings in a post at LinkedIn:

Recently, we ran a controlled experiment with Microsoft’s Office of the Chief Economist, and the results are groundbreaking. In this experiment, we asked legal professional volunteers on our team to complete three realistic legal tasks and randomly granted Copilot to some participants. Individuals with Copilot completed the tasks 32% faster and with 20.3% greater accuracy!

Copilot isn’t just a tool; it’s a game-changer, empowering our team to focus on what truly matters by enhancing productivity, elevating work quality, and, most importantly, reclaiming time.

All findings statistically significant at P<0.05.

Full results.

]]>
American Airlines – defective delay notification, limited rebooking contrary to tariff https://www.benedelman.org/american-airlines-defective-delay-notification/ Wed, 24 Apr 2024 13:00:12 +0000 https://www.benedelman.org/?p=2126 ComplaintAnswer.

Status: Pending

Summary: AA delayed an intercontinental flight by 27 hours, but in email notification didn’t say what flight was delayed or by how long.  AA’s staff offered contradictory statements of passenger rights and rebooking options, including multiple supposed rules untethered to the Tariff.

]]>
Early LLM-based Tools for Enterprise Information Workers Likely Provide Meaningful Boosts to Productivity https://www.benedelman.org/early-llm-based-tools-for-enterprise-information-workers-likely-provide-meaningful-boosts-to-productivity/ Tue, 05 Dec 2023 16:45:42 +0000 http://www.benedelman.org/?p=2041 Continue reading "Early LLM-based Tools for Enterprise Information Workers Likely Provide Meaningful Boosts to Productivity"

]]>
Early LLM-based Tools for Enterprise Information Workers Likely Provide Meaningful Boosts to Productivity. Microsoft Research Report – AI and Productivity Team. With Alexia Cambon, Brent Hecht, Donald Ngwe, Sonia Jaffe, Amy Heger, Mihaela Vorvoreanu, Sida Peng, Jake Hofman, Alex Farach, Margarita Bermejo-Cano, Eric Knudsen, James Bono, Hardik Sanghavi, Sofia Spatharioti, David Rothschild, Daniel G. Goldstein, Eirini Kalliamvakou, Peter Cihon, Mert Demirer, Michael Schwarz, and Jaime Teevan.

This report presents the initial findings of Microsoft’s research initiative on “AI and Productivity”, which seeks to measure and accelerate the productivity gains created by LLM-powered productivity tools like Microsoft’s Copilot. The many studies summarized in this report, the initiative’s first, focus on common enterprise information worker tasks for which LLMs are most likely to provide significant value. Results from the studies support the hypothesis that the first versions of Copilot tools substantially increase productivity on these tasks. This productivity boost usually appeared in the studies as a meaningful increase in speed of execution without a significant decrease in quality. Furthermore, we observed that the willingness-to-pay for LLM-based tools is higher for people who have used the tools than those who have not, suggesting that the tools provide value above initial expectations. The report also highlights future directions for the AI and Productivity initiative, including an emphasis on approaches that capture a wider range of tasks and roles.

Studies I led that are included within this report:

]]>
Randomized Controlled Trials for Microsoft Copilot for Security https://www.benedelman.org/randomized-controlled-trial-for-microsoft-security-copilot/ Tue, 05 Dec 2023 16:38:41 +0000 http://www.benedelman.org/?p=2039 Continue reading "Randomized Controlled Trials for Microsoft Copilot for Security"

]]>
Randomized Controlled Trials for Microsoft Copilot for Security. SSRN Working Paper 4648700. With James Bono, Sida Peng, Roberto Rodriguez, and Sandra Ho.

We conducted randomized controlled trials (RCTs) to measure the efficiency gains from using Security Copilot, including speed and quality improvements. External experimental subjects logged into a M365 Defender instance created for this experiment and performed four tasks: Incident Summarization, Script Analyzer, Incident Report, and Guided Response. We found that Security Copilot delivered large improvements on both speed and accuracy. Copilot brought improvements for both novices and security professionals.

(Also summarized in What Can Copilot’s Earliest Users Teach Us About Generative AI at Work? at “Role-specific pain points and opportunities: Security.” Also summarized in AI and Productivity Report at “M365 Defender Security Copilot study.”)

]]>
Sound Like Me: Findings from a Randomized Experiment https://www.benedelman.org/sound-like-me-findings-from-a-randomized-experiment/ Tue, 05 Dec 2023 16:37:01 +0000 http://www.benedelman.org/?p=2037 Continue reading "Sound Like Me: Findings from a Randomized Experiment"

]]>
Sound Like Me: Findings from a Randomized Experiment. SSRN Working Paper 4648689. With Donald Ngwe.

A new version of Copilot for Microsoft 365 includes a feature to let Outlook draft messages that “Sound Like Me” (SLM) based on training from messages in a user’s Sent Items folder. We sought to evaluate whether SLM lives up to its name. We find that it does, and more. Users widely and systematically praise SLM-generated messages as being more clear, more concise, and more “couldn’t have said it better myself”. When presented with a human-written message versus a SLM rewrite, users say they’d rather receive the SLM rewrite. All these findings are statistically significant. Furthermore, when presented with human and SLM messages, users struggle to tell the difference, in one specification doing worse than random.

(Also summarized in What Can Copilot’s Earliest Users Teach Us About Generative AI at Work? at “Email effectiveness.” Also summarized in AI and Productivity Report at “Outlook Email Study.”)

]]>
Measuring the Impact of AI on Information Worker Productivity https://www.benedelman.org/measuring-the-impact-of-ai-on-information-worker-productivity/ Tue, 05 Dec 2023 16:34:25 +0000 http://www.benedelman.org/?p=2035 Continue reading "Measuring the Impact of AI on Information Worker Productivity"

]]>
Measuring the Impact of AI on Information Worker Productivity. SSRN Working Paper 4648686. With Donald Ngwe and Sida Peng.

This paper reports the results of two randomized controlled trials evaluating the performance and user satisfaction of a new AI product in the context of common information worker tasks. We designed workplace scenarios to test common information worker tasks: retrieving information from files, emails, and calendar; catching up after a missed online meeting; and drafting prose. We assigned these tasks to 310 subjects tasked to find relevant information, answer multiple choice questions about what they found, and write marketing content. In both studies, users with the AI tool were statistically significantly faster, a difference that holds both on its own and when controlling for accuracy/quality. Furthermore, users who tried the AI tool reported higher willingness to pay relative to users who merely heard about it but didn’t get to try it, indicating that the product exceeded expectations.

(Also summarized in What Can Copilot’s Earliest Users Teach Us About Generative AI at Work? at “A day in the life” and “The strain of searching.” Also summarized in AI and Productivity Report at “Copilot Common Tasks Study” and “Copilot Information Retrieval Study.”)

]]>
Edelman v. Harvard https://www.benedelman.org/edelman-v-harvard-filed/ Tue, 14 Feb 2023 21:59:02 +0000 http://www.benedelman.org/?p=1864 Continue reading "Edelman v. Harvard"

]]>
For 11 years, I was a faculty member at Harvard Business School. I met and exceeded the school’s high standards for research and teaching. I loved my work and won high praise from colleagues and students for my contributions to multiple academic fields, for my teaching, and for my service to the school. I looked forward to continuing my work at HBS for the foreseeable future.

My promotion to tenure was derailed by improper disciplinary proceedings — a kangaroo court that ran roughshod over the governing rules. For example, the rules require the disciplinary committee to share with me (and readers of its report) “the evidence gathered.” Far from providing evidence, the 2017 report attached zero emails, zero transcripts of interviews, and zero other documents. Instead of providing evidence, the committee offered mere summaries of twelve anonymous interview remarks, with both names and contexts intentionally removed, in brazen violation of the requirement to provide “the evidence gathered.” That’s not justice, and it’s clearly not permitted under the applicable rules.

I’m suing Harvard to insist that these proceedings be corrected, in conformance with the rules. I’m not perfect — who is? — but if the proceedings follow the rules, I will clear my name of the incorrect charges, and my candidacy can then be evaluated on its merits.

Case web site including complaint and other documents

]]>
Southwest Airlines – “class waiver” https://www.benedelman.org/southwest-airlines-class-waiver/ Mon, 09 Jan 2023 17:00:41 +0000 http://www.benedelman.org/?p=1892 Complaint. Answer.

Status: Pending.

Summary: Federal regulation favors private resolution of disputes between passengers and airlines. But Southwest’s “class waiver” disallows passengers from gathering together with a single set of lawyers and experts for efficient, cost-effective group resolution of their complaints.

]]>