Designing interpretable machine learning outputs for customer service agents
I looked into the relationship between Einstein, the machine learning features of Salesforce Service Cloud, and a customer service agent. I created design variations on two recommendation features in the console and tested them with users, revealing insights into user behaviour.
I was a product design intern on the Einstein for Service Cloud UX team. I took part from start to finish in the design process for this project. I worked with other product designers, user researchers, and talked with product managers and data scientists.
The usability study for my designs was completed by 52 participants. The insights were compiled and I provided recommendations for next steps with the project. The lead researcher is preparing a detailed report for future reference.
Machine learning and artificial intelligence is used in Salesforce Service Cloud to improve and streamline a business' customer service operations. This includes features for customer service agents, administrators and supervisors that can save time and increase productivity. For a customer service agent, saving time can help them serve more customers and provide better customer service.
When I arrived at Salesforce, I was tasked with looking into how we could improve the interpretability of machine learning outputs in Service Cloud for customer service agents.
The first thing to do was to understand Salesforce, my users and my problem space more by speaking to team members and doing secondary research.
Looking into past research studies, I was able to gain a comprehensive view of a customer service agent and the struggles they went through day-to-day.
Understanding the user's perspective helped me understand their relationship with Salesforce Service Cloud and its machine learning capabilities. However, there was still a lot that I didn't know about my problem space.
With limited machine learning and artificial intelligence knowledge, I sought to understand what interpretability meant in terms of the users. I talked to data scientists at Salesforce, senior designers and researchers to develop my own definition. I knew that understanding how interpretability fit into the product and the scope of my project would help bring direction to my project.
Here's a diagram of what's currently happening from a customer service agent's point of view:
When a customer reaches out to an agent via messaging, their conversation can be opened up in the console. The agent will formulate and write a response to the inquire that the customer has. With Einstein, an agent has the opportunity to see articles and replies that are generated based on machine learning algorithms. For example, in article recommendations, this is done by looking at keywords and past cases.
Interpretability to a service agent is understanding why a recommendation is given. An agent should be able to answer: "why did I get this recommendation?"
With the user goals clear and an understanding of interpretability, I performed competitive research, looked into best-practices and held a workshop to better prepare myself for visually designing anything for the project.
First, I performed a product audit. I looked into products within Service Cloud and also across different clouds as well. Then, I looked into competitors like Zendesk, Service Now and Agent AI among others to see how they displayed outputs of their machine learning outputs. I noticed that many competitors used a relevance score (ex. 90%) when displaying suggestions and were not interpretable at all.
This made me question the role of interpretablity and how important it really was for an agent. I decided that this was something I wanted to test.
For my workshop, I decided to pull back and focus on a broader topic to generate discussion and out-of-the-box ideas. I brought together designers, researchers and product managers in a one hour workshop. Instead of focusing on interpretabilty, I wanted to understand the nature of the relationship between Einstein and customer service agents. I gathered ideas and grouped them by themes, which is shown below:
From my workshop and the research I did, I created a set of principles to guide my designs.
I wanted to answer two questions with my diagram:
How might we communicate interpretable machine learning outputs in a transparent and clear manner?
How might we facilitate respectful, supportive and humanized collaboration?
I drew some sketches in my notebook and then used Sketch to create some low fidelity wireframes. My early sketches consisted of mainly two directions; ways to incorporate interpretability into existing features and ways to incorporate these features within the console.
From many iterations and design critiques, I narrowed down my scope and focused on creating various designs for article and reply recommendations that I could test. For each of these features, the variations differed in the levels of interpretability. For example, the relevance option (variation 1) is the least interpretable because it shows relevance through the use of a coloured bar, but it doesn't provide any insight on why the recommendation is of high relevance. In comparison, the social proof option (variation 2) provides agents with the knowledge with where the recommendation came from.
Eventually, I ended up with 3 designs for article recommendations and 4 designs for reply recommendations. I wanted to test these designs and see which ones agents prefer, as well as what they value in a design.
We decided to test these design variations by running an unmoderated user study with the user of UserZoom. For each design variation, we created 3 different scenarios a user may encounter:
The user's task would be to go through each of these scenarios and choose the right course of action; select the right recommendation or not choose any recommendations if they are irrelevant. Then, they were asked some questions about each design in the end.
Altogether, 27 screens were created and the study was sent to as many service cloud agents as possible.
We received 52 responses for this study (which was an awesome number!)
Details, numbers, and graphs are not shown in this case study due to confidentiality concerns. However, there were three main insights from the study:
And thus, we chose the relevance bar design as the "winning design."
Here is what some of the agents said:
From the user testing results, I presented my findings to product managers and my team. I provided three recommendations for designing for interpretability:
It is interesting to note that although my project was focused on interpretability, it turned out my users didn't care about it as much as other factors such as relevance and readability.
During my last two weeks at my internship, I explored how motion would look in the console along with the new designs I created.
Working on this project helped me with my product thinking skills and challenged me to develop my own problem scope and space for my project. I was given a good amount of freedom to carve out the details of my project and as a result, I developed my interpersonal skills, organizational skills and learned what it meant to be a team member on a design team.
View More Projects
Einstein Interpretability @ Salesforce Service CloudProduct | Desktop | Testing
Improving Referrals @ DropProduct | Mobile
Adobe x The Ocean AgencyDesign Competition
Improving the Amazon Shopping AppDesign Challenge
Tribel: Music CollaborationSide Project
12 Days of Gifting @ DropGraphic Design | Marketing
Context Chats @ Air MilesUser Research | Mobile
Daily Illustration JournalVector Illustration
Eggy ComicsDigital Drawing