Retail AI Platform Powered by Foundation Models, Image Recognition & Demand Intelligence
Trained on 1B+ shelf images, 1M+ SKUs, and 10M+ planograms, Vision Group's retail foundation models power a unified AI engine that understands the shelf, predicts true demand, and drives agentic execution across every store.
The OpenAI analogy for retail.
OpenAI trained a language foundation model on text from the internet. Vision Group trained retail foundation models on data from the shop floor. The analogy is exact — and the moat is the same. The dataset took 11 years to build. It cannot be bought.
Gartner predicts 50%+ of enterprise AI models will be domain-specific by 2027. Vision Group's retail foundation models are the definition of that prediction — already there, already compounding.
AI Platform Layer
- 3D digital twin generation
- Automated AI training pipeline
- Real-time agentic rules engine
- Multi-model pipeline orchestration
Multimodal AI Models
- Retail image recognition engine (1B+ training images)
- 3D scene reconstruction
- Vision-language model (price tags, menus)
- Shelf sales volume & behavior model
Strategy AI Models
- Natural language data intelligence
- AI assortment optimization (Assortment.AI)
- AI demand transfer model
- AI consumer decision trees
- Retail demand foundation model (Demand.AI)
The data flywheel — why models improve continuously
improves the image recognition foundation model — for every customer, not just the one whose shelf was photographed Every transaction
improves the consumer decision tree models — building a more precise map of substitution behavior across every market Every assortment decision
improves the demand forecasting models — the gap between forecast and actual narrows with every cycle
Three integrated sub-systems. One retail AI engine.
The AI Engine is not a single model. It is three purpose-built sub-systems — AI Platform, Multimodal AI, and Strategy AI — working together across all five intelligence layers.
AI Platform Layer
The foundation of the system — combining 3D digital twin generation, automated AI training pipelines, a real-time agentic rules engine, and multi-model orchestration to continuously train, manage, and deploy intelligence at scale.
Multimodal AI Models
At the core is a retail image recognition engine trained on 1B+ real-world images, enhanced by 3D scene reconstruction, vision-language models that interpret price tags and menus, and behavioural models that connect what happens on shelf to what sells.
Strategy AI Models
On top sits a layer of decision intelligence — natural language data analysis, AI-driven assortment optimisation, demand transfer modelling, and consumer decision trees — culminating in a retail demand foundation model that predicts true demand, not just observed sales.
AI Platform Layer
3D Digital Twin Generation
Generates 3D models and complete digital twins of retail products from six-sided images or volumetric captures — creating the product content foundation that powers planogram generation, shelf recognition, and space planning across all five intelligence layers.
POWERS:
Product.AI Space.AIAutomated AI Training Pipeline
Automates data cleaning, annotation, and synthetic dataset generation to continuously train and improve computer vision models — removing manual data labelling and enabling AI accuracy to improve at scale without proportional human effort.
POWERS:
All Five LayersReal-Time Agentic Rules Engine
A real-time AI rules engine that continuously checks store data against Perfect Store targets and planogram standards — instantly triggering corrective actions without waiting for human review. This is the agentic AI capability Gartner predicts will define 33% of enterprise apps by 2028. Vision Group has it live today.
POWERS:
Execution.AIMulti-Model AI Pipeline Orchestration
Manages complex, multi-model AI pipelines — coordinating different models simultaneously and adapting to category, regional, and retailer nuances for scalable execution across 340+ customers and 75+ countries, without custom engineering for each deployment.
POWERS:
All Five LayersMultimodal AI Models
Retail Image Recognition Engine
Converts unstructured shelf image and video data into structured retail intelligence — identifying SKUs, counting facings, detecting gaps, measuring share of shelf, and flagging compliance failures in real time. Trained on 1B+ retail images across 11+ years — the most retail-specific image recognition training dataset in the market.
POWERS:
Execution.AI Product.AI3D Scene Reconstruction Model
Performs 3D reconstruction from multi-view shelf and store images — enabling spatial localisation, product deduplication, and precise positioning within a retail fixture. Generates planogram-accurate 3D representations of real shelves from standard field team photography, without specialist equipment.
POWERS:
Space.AI Product.AIShelf Sales Volume & Behavior Model
Analyses shelf images at different time points to measure sales volume from shelf change — comparing before and after states to infer sell-through rates and identify which planogram layouts drive highest removal rates. Also processes shopping behaviour video to understand consumer movement, dwell time, and product interaction at the fixture level.
POWERS:
Demand.AI Assortment.AIRetail Image Recognition Engine
Converts unstructured shelf image and video data into structured retail intelligence — identifying SKUs, counting facings, detecting gaps, measuring share of shelf, and flagging compliance failures in real time. Trained on 1B+ retail images across 11+ years — the most retail-specific image recognition training dataset in the market.
POWERS:
Product.AI Space.AIStrategy AI Models
Natural Language Data Intelligence
An LLM-based auto data analysis system that answers natural-language questions by querying all platform data in real time. Category managers can ask "which SKUs are driving the most category exits in the South East?" and receive a data-backed answer in seconds — without a data analyst or SQL query.
POWERS:
All Five LayersAI Assortment Optimization Model
Recommends the optimal product assortment for every store at store-cluster level — not just banner-wide averages. Inputs include demand signals, consumer decision trees, store demographics, product attributes, and category strategy constraints. Outputs a ranked assortment recommendation with expected revenue lift per change.
POWERS:
Assortment.AIAI Demand Transfer Model
Predicts where and how demand shifts when products are added or removed from the range — quantifying transfer to adjacent SKUs, competitor brands, and category exit at store-cluster level. Built from POS and loyalty data across 340+ customers. Commercially live — drives Assortment.AI simulations today.
POWERS:
Assortment.AI Demand.AIAI Consumer Decision Tree Model
Maps the hierarchy of purchase decisions consumers make at the fixture — brand first, category first, price tier first, or occasion first — segmented by store type, channel, and shopper cohort. Built from real observed POS and loyalty data. Drives assortment decisions that reflect how shoppers actually think, not how planners assume they think.
POWERS:
Assortment.AI Demand.AIRetail Demand Foundation Model
Predicts sales volume based on in-store display conditions, planogram compliance, promotional execution, and external factors including seasonality, events, and weather. Unlike standard forecasting models that use only historical POS data, Vision Group's model incorporates real shelf execution state — knowing whether a product is actually on shelf, correctly placed, and correctly priced — producing forecasts that reflect true demand rather than observed sales constrained by execution failures.
→ Standard models forecast from observed sales — constrained by stockouts and execution failures
→ This model incorporates real shelf execution state from Execution.AI before forecasting
→ Result: forecasts reflect true demand — not what sold, but what consumers wanted to buy
POWERS:
Assortment.AI Demand.AINo retail AI vendor has a model stack of this depth and breadth.
The Gartner predictions about domain-specific, agentic, and multimodal AI describe what Vision Group has already built — and has had live in production for years.
See the AI engine at work.
Demo includes a live walkthrough of the foundation models and how they power each intelligence layer.