Salesforce unveils first LLM benchmark for CRM industry

Wed, 3rd Jul 2024

FYI, this story is more than a year old

Salesforce has introduced what it describes as the world's first large language model (LLM) benchmark for customer relationship management (CRM) systems. This latest development is designed to assist businesses in evaluating the expanding array of LLMs for use in their CRM operations.

The benchmark is the result of work by Salesforce AI Research and functions as a comprehensive evaluation framework. It measures LLM performance based on four key criteria: accuracy, cost, speed, and trust and safety. The benchmark is crafted to assess common sales and service use cases, including prospecting, lead nurturing, and summarising sales opportunities and service cases.

Additionally, it features a public leaderboard to guide CRM professionals in selecting the most suitable LLM for their needs. Salesforce plans to continue incorporating new use case scenarios and aims to include fine-tuned LLMs in future evaluations.

Silvio Savarese, Executive Vice President and Chief Scientist at Salesforce AI Research, commented, "As AI continues to evolve, enterprise leaders are saying it's important to find the right mix of performance, accuracy, responsibility and cost to unlock the full potential of generative AI to drive business growth."

"Salesforce's new LLM Benchmark for CRM is a significant step forward in the way businesses assess their AI strategy within the industry. It not only provides clarity on next-generation AI deployment but also can accelerate time to value for CRM-specific use cases. Our commitment is to continuously evolve this benchmark to keep pace with technological advancements, ensuring it remains relevant and valuable."

Existing LLM benchmarks have been predominantly limited to academic and consumer use cases, often falling short in business-relevant metrics such as accuracy, speed, cost, and trust. Many of these benchmarks also lack thorough expert human evaluations. This has left CRM customers without a reliable means to assess the effectiveness of generative AI-powered CRM solutions, making it difficult to make informed decisions.

Developed by Salesforce AI Research, the new benchmark leverages real-world CRM data and expert human evaluations by practitioners. This enables businesses to make strategic decisions about incorporating generative AI into their CRM systems, focusing on four primary metrics:

1. Accuracy: This includes subcategories such as factuality, completeness, conciseness, and instruction-following. Accurate predictions and recommendations are crucial for generating valuable results, leading to improved customer experiences. Techniques such as prompt engineering and fine-tuning can enhance a model's accuracy if needed.

2. Cost: Costs are categorised as high, medium, or low, based on percentiles. This metric helps customers evaluate the cost-effectiveness of different LLMs, ensuring alignment with their budget and resource allocation strategies.

3. Speed: Speed assesses the responsiveness and efficiency of LLMs in processing and delivering information. Faster response times can enhance the user experience, reduce customer wait times, and enable sales and service teams to promptly address inquiries and issues.

4. Trust and Safety: This measures an LLM's ability to protect sensitive customer data, comply with data privacy regulations, secure information, and avoid bias and toxicity. Assessing reliability in these areas provides organisations with transparency regarding trust and safety.

Organisations can utilise this benchmark to compare LLMs, identify the best solutions, and make informed decisions to enhance customer success and drive business growth. The Salesforce Einstein 1 Platform allows customers to select from existing LLMs or use their own models to meet unique business needs. By evaluating models using the benchmark, businesses can deploy more effective and efficient generative AI solutions.

Clara Shih, CEO of Salesforce AI, noted, "Business organisations are looking to utilise AI to drive growth, cut costs, and deliver personalised customer experiences, not to plan a kid's birthday party or summarise Othello. Our customers have been asking for a purpose-built way to evaluate and select from among the proliferation of new AI models, and we are thrilled to introduce the world's first LLM benchmark for CRM to help them navigate the complex landscape of models."

"This benchmark is not just a measure; it's a comprehensive, dynamically evolving framework that empowers companies to make informed decisions, balancing accuracy, cost, speed, and trust."

ChatGPT

Key takeaways Explain why it matters Create action plan Future watch

Claude

Key takeaways Explain why it matters Create action plan Future watch

Perplexity

Key takeaways Explain why it matters Create action plan Future watch

Grok

Key takeaways Explain why it matters Create action plan Future watch

Share Share

Add us as a preferred source on Google