AI Shopping Assistants Stuck in the Past: Recency and Bias in Buying

  • Thread Author
Futuristic AI product wall with four panels and a wristwatch beside a phone.
The holiday shopping season has become a proving ground for four of the biggest AI platforms — ChatGPT, Google’s Gemini, Perplexity, and Microsoft Copilot — but a hands‑on test with a simple request for an Android smartwatch exposes a recurring problem: these assistants can be impressively helpful and surprisingly wrong, frequently steering shoppers toward older models, missing recent releases, and offering agentic conveniences that sometimes do more harm than good.

Background / Overview​

AI companies raced to add shopping features in late 2025, turning assistants into full‑blown buying tools rather than just search companions. OpenAI added a conversational Shopping Research mode inside ChatGPT that builds personalized buyer’s guides and comparison charts; Google folded agentic calling and price‑tracking into Gemini and Search, powered by Duplex; Perplexity rolled out an Instant Buy flow using PayPal; and Microsoft consolidated Edge’s price comparison, price history, tracking, and cashback tools into Copilot in Edge. Each vendor described the features as ways to save time and reduce the friction of comparing specs, prices, and retail availability. These product moves are consequential: rather than sending you to 10 store pages and review sites, the new assistants aim to resolve the full purchase path inside a single conversational thread — or even execute the purchase for you. That promise has practical benefits, but it also places a lot of trust in model freshness, data coverage, and merchant integrations that are still maturing.

Where the assistants trip up: the smartwatch case study​

A single, repeatable test illuminates common failure modes. A reviewer asked four assistants the same prompt — roughly, “Help me find a good Android smartwatch for my Nothing CMF Phone 1” — and let each tool run its shopping flow. The outcomes were encouraging in process: ChatGPT asked clarifying questions and produced a detailed buyer’s guide; Copilot surfaced price history and review summaries; Gemini offered agentic store calls for local availability; Perplexity gave compact product cards and direct buy buttons. But across providers the recommendations skewed toward earlier generations of devices rather than showing the newest releases first. Concrete examples include:
  • ChatGPT’s Shopping Research suggested the Garmin vívoactive 5 but missed the more recent vívoactive 6 in its top set (the vívoactive 6 was released in 2025 and includes additional storage, updated GNSS and running tools).
  • Gemini returned comparisons that included the Google Pixel Watch 2 (a 2022/2023 device in many markets) rather than prioritizing Pixel Watch 4 — a choice that downplays newer hardware trade‑offs.
  • Perplexity suggested the Pixel Watch 4 but also surfaced a 2021 Samsung Galaxy Watch 4 and several low‑value off‑brand “smartwatches” in its shopping tab, mixing relevant and irrelevant results.
  • Copilot highlighted the CMF Watch Pro 2 (which pairs with the CMF Phone 1) and included useful price‑tracking tools, but initially overlooked the CMF Watch Pro 3 — a newer CMF model that many outlets list as available in 2025.
The reviewer’s experience is valuable because it shows that these assistants are not failing in abstract, algorithmic ways; they’re making choices that would plausibly lead a shopper to buy a last‑generation product when a newer version is available that might change the purchase decision.

Why do AI shopping assistants recommend older models?​

There’s no single bug here — a combination of systemic factors explains the pattern:
  • Catalog and data lag. Product discovery depends on up‑to‑date retail data, manufacturer feeds, and freshly indexed reviews. If a model’s dataset or connectors lag by months, an assistant will surface better‑documented older models first. OpenAI and other vendors acknowledge that price and availability may still be inaccurate and encourage verification on merchant sites.
  • Review volume and signal strength. Older models tend to have more user reviews, more long‑form editorial coverage, and stronger SEO signals. Retrieval‑based systems and ranking heuristics often prefer items with richer corroborating signals even if they’re not the newest. This is a common trade‑off in product search and conversational product retrieval research.
  • Merchant allowlists and integration limits. Some assistant features rely on partnerships or allowlists to surface Instant Buy, price‑tracking, or verified listings. If a new release isn’t yet widely carried by connected merchants or hasn’t been added to a partner catalog, the assistant may be unable to retrieve it as a purchaseable option. Perplexity’s Instant Buy, for example, initially worked only with a subset of PayPal‑enabled merchants.
  • Conservative product ranking. Many shopping flows intentionally bias toward “stable” picks (best reviews, long availability) to reduce return rates and post‑purchase dissatisfaction. That conservatism can penalize new releases that lack long review histories even if they’re objectively superior.
  • Agentic and access constraints. When an assistant offers agentic behaviors — like Google’s Duplex calls to local stores — the outcome depends on the agent’s permitted scope and the businesses’ public profiles. These calls can return “no stock” simply because the stores indexed don’t list certain brands or the AI misclassified the query. The caller experience also varies by region and retailer opt‑outs.
Many of these factors are benign engineering realities, but combined they produce a practical harm: shoppers might buy hardware that’s functionally inferior because the assistant surfaced a better‑documented but older model.

Strengths and real benefits: what the assistants did well​

Despite the stumbles, these tools introduce capabilities real shoppers will value:
  • Guided, conversational discovery. ChatGPT’s Shopping Research asks clarifying questions (budget, use case, preferred features) and builds a side‑by‑side comparison — a convenience for shoppers who don’t know where to start. That conversational narrowing mirrors longstanding best practices in product advising.
  • Price history and alerts. Copilot in Edge shows price history charts, lets you set price‑target alerts, and surfaces cashback opportunities — features that materially help timing a purchase. Microsoft has explicitly migrated Edge’s price‑tracking and comparison tools into Copilot’s sidebar.
  • Friction‑reducing checkout. Perplexity’s Instant Buy (PayPal) and OpenAI’s stated plan to connect Shopping Research to Instant Checkout aim to move the final clicks into the assistant itself. For well‑supported merchants, that reduces cart abandonment and saves time.
  • Agentic tasks that save time. Google’s Duplex‑powered “call for me” can quickly query multiple local shops about stock or pricing and return a neat summary — valuable for time‑starved users or for regional product availability checks. When it works accurately, it removes the repetitive task of phoning many stores.
These are not theoretical wins: price alerts save money, comparative guides reduce buyer’s remorse, and well‑implemented Instant Buy shortens the checkout funnel. The user experience can be compelling when the inputs are fresh and the merchant integrations are broad.

Risks and potential harms to watch​

AI shopping assistants create new vectors of risk that shoppers and regulators should notice:
  • Outdated recommendations leading to poor buys. The central user risk is buying a last‑generation device because the assistant surfaced it first or failed to flag a newer model’s availability or tradeoffs.
  • Opaque provenance and confidence. Many results lack clear timestamps, source provenance, or confidence levels. When an assistant recommends a model, users need to know whether the recommendation is based on June 2025 manufacturer specs, a November 2025 retailer feed, or an older product comparison. Without that context, decisions are harder to evaluate.
  • Agentic nuisance and business impact. Google’s calling feature is useful, but businesses can receive many automated calls, and some have objected to the volume and quality of those interactions. Businesses can opt out, but opt‑out imposes friction and can reduce local inventory visibility for shoppers.
  • Checkout safety and merchant relationships. Embedded checkout — Instant Buy — centralizes payment and post‑purchase flows. While PayPal integration reduces friction, it also concentrates trust and privacy expectations on the AI provider and the payment partner; disputes, returns, and fraud prevention become complex multi‑party interactions. Early Perplexity rollouts show limited merchant coverage and promotional incentives to drive adoption, not comprehensive coverage.
  • Bias toward high‑signal (old) products and against new entrants. Startups and smaller vendors may struggle to surface in AI flows because they lack many reviews or connection to a large merchant feed. That can concentrate buyer traffic toward incumbents and established models.
  • Regulatory and antitrust concerns. When assistants control discovery and checkout, the platform’s ranking choices — and potential revenue‑sharing with merchants — raise questions about transparency and competition. Policymakers will likely scrutinize these flows as AI moves more commerce inside proprietary interfaces.
Taken together, these risks don’t doom the technology, but they do argue for careful rollout, stronger transparency, and consumer safeguards.

How to use AI shopping assistants safely (practical playbook)​

Shoppers can get the best of both worlds — speed and accuracy — by treating AI assistants as first drafts rather than final arbiters. Here’s a practical, 9‑step checklist to use when shopping with an AI assistant:
  1. Ask for recency explicitly: add “current” or “latest model released in 2025” to your prompt.
  2. Request release dates and firmware versions for recommended devices, and ask the assistant to show publication dates for the review or retailer pages it used.
  3. Use price tracking features: set a target price alert or enable Copilot/Perplexity price‑watch if available.
  4. Cross‑verify specs on manufacturer pages for critical changes (storage, GPS, charging standard, battery life). Don’t accept an assistant’s specification as definitive without a source.
  5. If an assistant offers agentic store calls, treat the summary as a convenience snapshot — call the store yourself if stock is mission‑critical.
  6. Check whether “Instant Buy” is supported for the merchant you prefer and review the returns policy and merchant‑of‑record details before paying.
  7. Ask the assistant to show only products released in a specific year or later (e.g., “show smartwatches released in 2024–2025”). If the assistant still surfaces old models, push back and request reasons.
  8. Use multiple assistants for cross‑checks: a price history chart from Copilot, a recommendation set from ChatGPT, and Instant Buy availability from Perplexity together create a more robust view.
  9. Keep an eye on data sharing and privacy settings; remove stored payment methods from assistants when not needed and review merchant data access.
These steps add a small amount of friction but substantially reduce the chance of a suboptimal purchase.

What the vendors should fix (product and policy recommendations)​

If AI shopping is going to replace human‑authored buyer’s guides, platforms must invest in fixable areas:
  • Surface freshness and provenance. Every product card should display the publication date of the core sources used, a confidence score, and the model’s data‑cutoff or last index time. Users must see whether a recommendation is based on recent retailer stock or older editorial reviews.
  • Prioritize release chronology. Enable filters such as “Show latest releases first” and automatic “compare to successor” prompts when a device has a newer generation.
  • Expose merchant coverage. When offering Instant Buy, show the percentage of merchants that support immediate checkout for that SKU and list which big merchants are missing.
  • Label agentic calls. When Duplex or similar agents call a business, the summary should include audio transcripts or verbatim answers and make opt‑out options for businesses transparent and easy.
  • Enforce and disclose partnership economics. Platforms should disclose whether product rankings are influenced by affiliate fees, Instant Buy arrangements, or commercial relationships.
  • Measure and publish misrecommendation rates. Platforms should run internal evaluations comparing assistant recommendations to up‑to‑date human editorial guides and publish aggregate error metrics for accountability. Academic work in conversational product search provides a framework for those evaluations.
These changes are all implementable engineering and policy steps that would materially raise the trustworthiness of AI shopping.

The broader outlook: agentic shopping is coming, but the next 12–24 months matter​

AI assistants are already shifting the discovery and checkout funnel. Over the next year to two years we should expect:
  • Deeper merchant partnerships and broader Instant Buy coverage. As payment providers and marketplace platforms sign on, embedded checkout will expand beyond early pilot merchants. Perplexity’s PayPal rollout is an early example.
  • Greater emphasis on freshness. Vendors will invest in near‑real‑time retail indexing and manufacturer API integrations to reduce the stale‑data problem. OpenAI and others have flagged ongoing improvements and caution users about price/availability inaccuracies for now.
  • Regulatory scrutiny. Antitrust and consumer‑protection authorities will examine how AI assistants steer purchases, ranking transparency, and whether embedded checkout creates preferential treatment for certain merchants. News coverage already highlights these concerns.
  • More refined agentic controls. Agents that call stores, buy tickets, and execute purchases will add granular user controls for limits, audit logs, and human‑in‑the‑loop confirmations.
If these trends materialize responsibly, AI will deliver a genuinely powerful shopping assistant — fast comparisons, trustworthy price alerts, and smoother checkouts. If implemented sloppily, the same features will lock shoppers into opaque flows, favor incumbents, and occasionally lead to buying the wrong product.

Conclusion​

AI shopping assistants have graduated from novelty to utility: they can build real buyer’s guides, track price history, call local stores, and execute purchases in a fraction of the time it takes to hop between websites. But the current generation still trips on recency, provenance, and coverage. The Verge‑style smartwatch test shows how those weaknesses matter in everyday decisions: an assistant that pushes a two‑year‑old model can cost buyers time, money, and satisfaction.
For now, AI shopping tools are best used as a fast starting point — an exploratory layer that narrows options and surfaces alerts — followed by a short verification loop: ask for release dates, confirm specs on manufacturer pages, and check that Instant Buy covers your preferred merchant and return policy. With transparency improvements, better merchant integrations, and clearer provenance, these assistants could become the most useful shopping companions we’ve had. Until then, a hybrid approach — AI plus a quick human cross‑check — is the safest path.
Source: The Verge My AI shopping assistants are stuck in the past
 

Back
Top