16 May 2026
Scaling Operational Intelligence: Ideas from the Morrisons Gemini Implementation
A deep dive into how Morrisons leverages Vertex AI and Gemini to bridge the gap between big data and store-level execution, providing a blueprint for enterprise AI deployment.
For most large-scale enterprises, the primary challenge is no longer gathering data. It is the latency between a data signal and a human decision. In the context of Morrisons, one of the UK’s largest grocery retailers, this gap manifest as store managers spending hours parsing inventory reports, supply chain logs, and customer feedback instead of managing their teams or improving the customer experience.
This article examines the technical and strategic framework Morrisons used to implement Google Cloud’s Gemini models through the Vertex AI platform. For CTOs and engineering leaders, this case study is more than a success story; it is a reference architecture for moving Generative AI (GenAI) from the lab into core operations. We will analyze the specific decisions regarding model selection, grounding strategies, and the shift from predictive ML to generative reasoning.
By the end of this analysis, you will understand the trade-offs between different Gemini model tiers, how to architect a RAG (Retrieval-Augmented Generation) system for operational staff, and how to validate AI reliability in high-stakes environments like retail logistics.
Core Mechanics: Moving Beyond the Chatbot
Morrisons did not build a generic chatbot. They built an operational layer. The technical core relies on three primary components within the Google Cloud ecosystem: Vertex AI, Gemini 1.5 Pro and Flash, and BigQuery.
Multi-modal Reasoning with Gemini
In a retail environment, data is rarely just text. It is stock levels (structured data), delivery notes (unstructured PDFs), and shelf photos (images). The Gemini 1.5 series is native multi-modal, meaning it processes these diverse inputs without requiring separate vision or OCR models to act as pre-processors. This reduces the complexity of the inference pipeline and lowers the risk of data loss during translation between specialized models.
The Grounding Framework
The most critical technical decision in the Morrisons implementation is the use of "grounding." An LLM’s internal knowledge is static. By grounding Gemini in Morrisons’ own internal data—specifically documentation, pricing engines, and real-time stock levels stored in BigQuery—the system provides responses based on fact rather than probability. Vertex AI Search and Conversation facilitate this by indexing internal corpora and providing the model with a specific "context window" of relevant documents before it generates a response.
Architecture and Patterns for Scalability
Scaling AI across hundreds of locations and thousands of employees requires a move away from monolithic prompt engineering toward a modular agentic architecture.
The Latency vs. Reasoning Trade-off
Morrisons’ developers had to choose which tasks belonged to which model. In a production environment, you do not use the most powerful model for every request.
1. Gemini - AI Models: Used for high-speed, high-volume tasks such as summarizing simple store alerts or identifying keywords in customer feedback. Its low latency is essential for mobile-first store staff.
RAG at Scale
To prevent the "context window" from becoming a bottleneck, the architecture employs a vector-based search pattern. When a store manager asks a question, the system does not feed the entire company handbook to the LLM. It performs a semantic search to find the most relevant paragraphs, then sends only those to the Gemini model. This keeps token costs down and accuracy high.
Use Cases: Bridging the Digital and Physical
While the technology is sophisticated, the value is in the application. Morrisons’ focus on practical implementation provides several high-value use cases that translate well to other industries.
1. Store Manager Copilot
Store managers deal with a massive volume of procedural documentation. Searching for a specific policy on hazardous waste or a new promotion's shelf layout used to take minutes. With Gemini, this becomes a natural language query. The model doesn't just find the document; it synthesizes the steps into a checklist.
2. Supply Chain Anomaly Detection
Standard ML is excellent at identifying patterns. Gemini is excellent at explaining the meaning of an anomaly. By feeding BigQuery output (e.g., "Stock for Item X is 20% lower than predicted") into Gemini along with logistics notes (e.g., "Weather delay at port Y"), the system can generate a plain-English explanation for the store manager, suggesting an alternative order.
3. Customer Sentiment Aggregation
Retailers receive thousands of data points daily from social media, emails, and in-store surveys. Gemini’s ability to categorize and sentiment-analyze this data in real-time allows regional managers to identify systemic issues—like a recurring checkout software bug—hours before they would have appeared in a weekly report.
Trade-offs, Risks, and Constraints
No architecture is without friction. When deploying Gemini via Vertex AI, leadership must weigh several constraints.
Token Costs and Budgeting
Unlike traditional SaaS pricing, LLM costs are variable based on the number of tokens (words/image parts) processed. For a retailer with 500 stores, a poorly optimized prompt that includes too much irrelevant data can lead to significant cost overruns. Monitoring and "prompt auditing" become essential engineering tasks.
The Reliability Gap
LLMs are stochastic, not deterministic. In retail, where a 1% error in pricing can lead to thousands of pounds in lost revenue, the model cannot be given autonomous control over transactional systems. Morrisons maintains a "human-in-the-loop" model where Gemini suggests actions, but a human confirms them.
Data Privacy and Sovereignty
Using Vertex AI ensures that the data Morrisons feeds into Gemini is not used to train Google’s public models. However, managing the lifecycle of that data—ensuring PII (Personally Identifiable Information) is scrubbed before it reaches the model—is a significant architectural burden that teams often underestimate.
Decision Criteria for CTOs
When evaluating whether to follow the Morrisons blueprint, use the following criteria to judge fit:
- Data Complexity: If your data is strictly structured (SQL tables only), traditional ML or BI may be more efficient. If your data is a mix of voice, image, and text, a multi-modal LLM like Gemini is likely necessary.
- Latency Requirements: Does the response need to be sub-second? If so, the architecture must rely heavily on Gemini Flash or cached embeddings. If the task is batch processing (e.g., overnight inventory analysis), Pro is the better choice.
- Integration Level: Do you need a standalone assistant or an integrated feature? Vertex AI excels when you are already on Google Cloud, as the integration with BigQuery and Cloud Storage is native and low-latency.
Common Pitfalls and How to Avoid Them
Senior engineering teams often fall into the trap of "over-prompting." They try to solve every accuracy problem with a longer prompt. This is a mistake.
- The Problem: Long prompts lead to "lost in the middle" syndrome, where the model ignores instructions in the center of the text.
- The Solution: Use few-shot prompting (providing examples) or fine-tuning for specific domain language. Morrisons benefits from Google’s managed fine-tuning capabilities within Vertex AI, which allow the model to learn the specific terminology of UK retail without needing a massive new training dataset.
Another pitfall is ignoring the "Cold Start" for staff. AI tools fail if they don't fit into existing workflows. Morrisons integrated these tools into the mobile devices store staff already carry, rather than asking them to sit at a terminal. This operational integration is as important as the code itself.
Takeaways
- Grounding is mandatory: Do not rely on an LLM's pre-trained knowledge for operational tasks. Use Vertex AI Search to connect models to your live data.
- Optimize for cost and speed: Use a tiered model approach —Flash for high-volume tasks, Pro for high-reasoning tasks.
- Focus on the last mile: The value of AI in retail (and most industries) is in the physical world. Ensure your AI output is actionable for non-technical staff.
- Maintain human oversight: Use GenAI to synthesize and suggest, but keep human-in-the-loop for any action that affects pricing, stock, or customer safety.
- Leverage platform native tools: If using Google Cloud, the BigQuery-to-Vertex pipeline is the most efficient path to production, reducing the need for custom data pipelines.
Join the newsletter
Enjoyed this article? Get more like it in your inbox every week.
* 200+ tech professionals already in.
Next read
18 May 2026
Integrating AI into Software Engineering Workflows: A Blueprint for Tech Leads
Move beyond IDE autocomplete. Learn how to architect AI workflow automation, manage the code review bottleneck, and select tools that drive measured improvement across the SDLC.
13 May 2026
Architecting the Agentic Data Cloud: Moving from Passive Queries to Active Workflows on GCP
Google Cloud's Agentic Data Cloud shifts AI from passive generation to autonomous action. Learn how to architect data-driven agents, manage trade-offs, and govern execution.
11 May 2026
7 Architectural Shifts from Google Cloud Next 2026: A Guide for Engineering Leaders
Analyze the core technical highlights from Google Cloud Next 2026, focusing on AI agent implementation, unified data layers, and infrastructure trade-offs for senior engineering.