Is your AI product actually working? How to develop the right metric system

Join our daily and weekly newsletters for the latest updates and the exclusive content on AI coverage. Learn more

In my first passage as a product manager (ML), a simple question inspired passionate debates through functions and leaders: how do we know if this product really works? The product in question I have managed responded to internal and external customers. The model allowed internal teams to identify the main problems encountered by our customers so that they can prioritize the good set of experiences to solve customers’ problems. With such a complex network of interdependencies among internal and external customers, the choice of good measures to grasp the impact of the product was essential for the Imerger towards success.

Do not follow if your product works well is like landing an airplane without any instruction in air traffic control. There is absolutely no way you can make informed decisions for your customer without knowing what is going well or bad. In addition, if you do not actively define the measurements, your team will identify their own rescue measures. The risk of having several flavors of a metric of “precision” or “quality” is that everyone will develop their own version, leading to a scenario where you may not all work towards the same result.

For example, when I examined my annual objective and the metric underlying with our engineering team, immediate comments were: “But it is a commercial metric, we are already following precision and recall.”

First of all, identify what you want to know about your IA product

Once you have taken the task of defining the measurements of your product – where to start? According to my experience, the complexity of the operation of an ML product with several customers also results in the definition of measures for the model. What to use to measure if a model works well? Measuring the result of internal teams to prioritize launches according to our models would not be fast enough; Measuring if the solutions adopted by the customer recommended by our model could risk drawing conclusions from a very wide adoption metric (and if the customer had not adopted the solution because he just wanted to reach an assistance agent?).

Quick advance in the era of large language models (LLMS) – where we have not only a unique outing of an ML model, we also have textual answers, images and music as outings. The dimensions of the product that require measures now increases quickly – formats, customers, type … The continuous list.

On all my products, when I try to find measures, my first step is to distill what I want to know about its impact on customers in a few key questions. The identification of the good set of questions facilitates the identification of the good set of measures. Here are some examples:

Did the customer get an outing? → Metric for the cover
How long did it take the product to provide an outing? → Metric for latency
Did the user like the output? → Measures for customer comments, customer adoption and retention

Once you have identified your key questions, the next step is to identify a set of sub which the “entry” and “output” signals. Exit measurements are late indicators where you can measure an event that has already occurred. Input measurements and main indicators can be used to identify trends or predict results. See below for the means to add the right sub which indicators to train and drive to the above questions. Not all questions must have head / delay indicators.

Did the customer get an outing? → Cover
How long did it take the product to provide an outing? → Latence
Did the user like the output? → Customer comments, customer adoption and retention
1. Did the user indicate that the output is good / bad? (to go out)
2. Was the outing good / just? (to input)

The third and last step is to identify the method to bring together measures. Most of the measures are collected on scale by new instrumentation via data engineering. However, in some cases (such as question 3 above) in particular for products based on ML, you have the possibility of manual or automated assessments which assess the outputs of the model. Although it is always preferable to develop automated assessments, starting with manual assessments for “the outing was good / just” and creating a section for the definitions of the good, just and not good will also help you to lay the basics of a rigorous and tested automated evaluation process.

Example of use cases: IA search, listing descriptions

The above frame can be applied to any product based on ML to identify the list of primary measures for your product. Let’s take research as an example.

Question	Metric	Metric nature
Did the customer get an outing? → Cover	% Research sessions with the search results shown to the customer	To go out
How long did it take the product to provide an outing? → Latence	Time taken to display search results for the user	To go out
Did the user like the output? → Customer comments, customer adoption and retention Did the user indicate that the output is good / bad? (Was the outing good / just? (To input)	% of research sessions with “thumbs” comments on customer search results or for% of search sessions with clicks from the customer % of the search results marked as “good / just” for each search term, by quality section	To go out To input

How about a product to generate descriptions for a list (whether it is a menu item in Doordash or a list of products on Amazon)?

Question	Metric	Metric nature
Did the customer get an outing? → Cover	% lists with generated description	To go out
How long did it take the product to provide an outing? → Latence	Time taken to generate descriptions to the user	To go out
Did the user like the output? → Customer comments, customer adoption and retention Did the user indicate that the output is good / bad? (Was the outing good / just? (To input)	% of announcements with generated descriptions that required changes to the Technical Content / Seller / Customer team % of the registration descriptions marked as “good / just”, by quality section	To go out To input

The approach described above is expandable to several products based on ML. I hope this framework will help you define the right set of measurements for your ML model.

Sharanya Rao is group product manager at Intuity.

Daily information on business use cases with VB daily

If you want to impress your boss, VB Daily has covered you. We give you the interior scoop on what companies do with a generative AI, from regulatory changes to practical deployments, so that you can share information for a maximum return on investment.

Read our privacy policy

Thank you for subscribing. Discover more VB newsletters here.

An error occurred.

Is your AI product actually working? How to develop the right metric system

First of all, identify what you want to know about your IA product

Example of use cases: IA search, listing descriptions

Leave a Reply Cancel reply

Follow US

Popular News

Has Germany’s ‘firewall’ against the far right been breached by AfD success?

Global Coronavirus Cases

Categories

Quick Link

Top Categories

Subscribe US

First of all, identify what you want to know about your IA product

Example of use cases: IA search, listing descriptions

You Might Also Like

Players tend to like games that give them rewards, Almedia study shows

We Love These Ground-Breaking EV Solutions at CES 2025 inkeinspires

Anti-aging zealot Bryan Johnson wants to start ‘foodome sequencing’

Insta360 X5 Review: The Best 360 Camera You Can Buy

AI Tools Helped Restore Speech for a Woman With Paralysis: ‘She Felt Embodied’

Leave a Reply Cancel reply

Follow US

Weekly Newsletter

Popular News

Has Germany’s ‘firewall’ against the far right been breached by AfD success?

Global Coronavirus Cases

Categories

Quick Link

Top Categories

Subscribe US