Imagine pausing your favorite TV show and seeing an ad for a product that looks exactly like the item in the scene – that’s a contextual product pause ad – a type of shoppable TV ad being developed by my company, Contxtual™.
Shoppable TV, (also referred to as “t-commerce”, or “interactive TV”) is a concept that consumers and video providers alike have long envisioned. With shoppable TV, viewers could easily purchase items they discover watching TV, and video providers could benefit from promising new revenue streams.
The influence of television on consumer purchasing behavior is well-documented and recent examples further illustrate this trend. The Netflix series Bridgerton, set in the early 19th century, saw a 43% increase in sales of empire-line dresses and a 27% rise in puff-sleeve clothing, despite its historical setting. This highlights the ability of television to shape consumer preferences across eras. Another instance of this phenomenon was seen in the popular series Euphoria, where an episode led to a 900% spike in web searches for “black cut-out dress” after a character was depicted wearing one.
In light of television’s impact on purchasing decisions, broadcasters and streaming companies have been exploring ways to effectively integrate shopping into their programming.
Paramount+ recently launched “Shop The Scenes” for its hit TV show, Yellowstone, which proved to be wildly popular with viewers of all ages, though the service quickly sold out of many products. The success of Yellowstone’s Shop The Scenes underscores the demand for shoppable TV, but also highlights the innate scalability challenges that have always limited shoppable TV initiatives to niche offerings.
Hidden Profit Potential
As new technologies overcome these scalability issues, the revenue potential of shoppable TV becomes notable. For example, let’s examine Contxtual’s approach where product ads are displayed on pause and the ads are contextually matched to items in the paused scene. Today, a typical viewer clicks pause 24 times per month. Assuming current CPM rates and 2-3 ads displayed for each pause, Average Revenue per User (ARPU) for the streamer would increase by over $2.50. For a company like Netflix, this translates to over $500 million per month in new ad revenue, and $6 billion per year.
Additionally, this model not only includes ad revenue, but also the potential for clicks, retargeting, and personalized shopping services optimized on first-party intent data. With the average person streaming TV for over 3 hours per day, the revenue potential of shoppable TV can have a major impact on the $500+ billion digital ad market. Fully executed, streaming services that incorporate contextually-matched product ads on pause could add $5-$20 ARPU to their bottom lines.
Traditionally, video has been under-optimized for advertising. Shoppable TV, however, allows for non-interruptive ad serving, making it possible to serve unlimited ads within every show. This presents a rare opportunity for broadcasters and streaming companies to truly disrupt the digital ad market.
Streaming services that incorporate contextually-matched product ads on pause could add $5-$20 ARPU.
Shoppable TV – Still A Niche Concept
Even with all its potential, shoppable TV has remained a niche concept, and there are several factors that have prevented streamers from fully realizing its potential. One major issue is the labor-intensive, bespoke product curation and fulfillment infrastructure that has been used for shoppable TV initiatives in the past. This makes it difficult to scale up the technology, as we have seen with the example of Yellowstone.
Solving For Scalability
To solve this scalability problem, Contxtual built an AI system similar to the technology behind OpenAI’s ChatGPT, that has the capability to programmatically serve contextual product ads thus eliminating bespoke curation and manual updating. However, in order to achieve this, several technical hurdles had to be overcome.
One such hurdle is to accurately identify products in video scenes with SKU-level accuracy. Today’s best AI computer vision technologies such as Google Vision, Amazon Rekognition and IBM’s Watson, all currently lack the precision to do this for a number of reasons. Products in scenes may be obscured, making them difficult for an AI to identify. Additionally, variations in lighting, shadows, and color grading can also make it difficult for an AI to identify products with high precision. The industry’s top AIs can only identify products by general categories, such as “shirt,” “chair,” or “car.” With millions of products in the world that fall under those categories, an ad system based on that limited data simply cannot “find” product ads similar enough to the items in the scene to be considered “contextual.”
Once the products could be identified, the next hurdle was to deal with the constant changes in product availability. Retail products often sell out quickly therefore, a scalable programmatic ad system must be able to track the availability of e-commerce products and match similar alternatives as those become unavailable. This requires sophisticated ad tracking and replacement systems.
Because shoppable TV initiatives in the past have relied on labor-intensive manual curation and updating of e-commerce products, is has been impossible to implement across large video catalogs. A workaround that has been tried was using product metadata to dynamically match alternative ads after a product sells out. However, it ultimately fall short because retail metadata is very unstructured and lacks key attribute standardization necessary to perform accurate matches. Systems have struggled to identify differences in styles and subtle color differences that are critical for shoppers. To be successful, shoppable ads must have a high degree of similarity to the product portrayed in the scene.
In order to meet the visual requirements of shoppable TV, metadata must have much more detail than the basic product categories identified by today’s AIs. It must take into consideration minute details such as style, fabric, texture, shape, and other visible characteristics. Currently, retail metadata is too unstructured to provide accurate ad-matching across multiple advertisers, compounding the challenges of creating a scalable programmatic shoppable TV system.
Streamers Could Challenge Ad Media Giants
As the technological obstacles that have impeded the scalability of shoppable TV in the past are being overcome, it is important to consider the potential profitability of this advertising medium. With the vast audience and engagement of streaming television, it will be incredibly lucrative for streaming services to be able to place e-commerce ads on every frame of their videos, easily accessible with the simple tap of a pause button. In fact, once in place, the key performance metrics of shoppable TV can certainly rival those of current ad media giants like Meta, Google, and Amazon.
Major Announcement Coming Soon
After eight years of research and development, Contxtual™ is preparing to announce a significant breakthrough that makes programmatic contextual product ads on pause a fully scalable and deployable reality for major streaming services. Our technology will provide streaming services with a new and unique ad unit on every frame of video and a new revenue stream that can allow them to disrupt the $567 billion digital ad market. At the same time, Contxtual™ aims to revolutionize the way people watch television by allowing viewers to discover and purchase products as they enjoy their favorite shows, making the TV viewing experience more interactive, engaging, and rewarding.