• Thread Author
In the emerging landscape of artificial intelligence, conversational agents are more than just repositories of information—they are digital companions, virtual assistants, and sometimes, sources of genuine comfort. As voice-enabled AI interactions become the norm, the real test lies in how naturally, helpfully, and empathetically these bots can engage us in dialogue. While many tout knowledge and prompt response times as differentiators, the experiences within the conversational interface—tone, adaptability, emotional intelligence—are now equally, if not more, relevant. Recently, a real-world comparative field test by a ZDNET contributor, Lance Whitney, delved into the voice-enabled conversational abilities of five leading AI apps: ChatGPT, Microsoft Copilot, Google Gemini, Meta AI, and Grok, with a simple but telling scenario—helping an anxious pet cat named Mr. Giggles.

The New Era of Voice-Driven AI: More Than Just Answers​

Voice mode is fast becoming standard in the major AI apps, transforming how users interact with technology on mobile. According to industry analysts, this move to voice is fueled by the desire for convenience, hands-free multitasking, and a closer simulation of human-human conversation. But with the proliferation of AI-enabled voice assistants, the quality of conversation—marked by empathy, nuance, and relatability—matters more than ever.
The premise for Whitney's experiment was deceptively simple: report to each AI bot that his cat, Mr. Giggles, was acting anxious after a vet visit, and observe not just what advice or information they provide, but how they provide it. This emphasis on "how"—the tone, personality, active listening, and subtle cues that create connection—offers a fresh lens through which the best AI conversationalists can be identified.

Setting the Stage: Five Major AI Bots Go Head-to-Head​

Whitney chose five of the most prominent general-purpose AI apps on the market, available on both iOS and Android platforms:
  • ChatGPT by OpenAI
  • Microsoft Copilot
  • Google Gemini
  • Meta AI
  • Grok
Each platform enabled customized voice selection—a small but significant touch, given that cadence, intonation, and perceived personality influence the user experience. Most allowed the choice of accents and tones, with some (like Meta AI) even offering celebrity voice emulations.
The experiment was conducted as a genuine, ongoing conversation—not just a single prompt and response. Whitney started with the same statement about Mr. Giggles’ anxiety, exploring further based on each AI’s responses, and specifically tracked each bot’s abilities to listen, empathize, probe for details, and offer actionable advice in a natural rhythm.

ChatGPT: The Virtual Empath and 'Best Listener'​

From the outset, ChatGPT distinguished itself not just through the content of its responses, but the gentle, inquisitive cadence with which it probed for more context. Whitney selected a bright, inquisitive female British voice—a choice that seemed to subtly reinforce the bot's warmth and empathetic presence.
ChatGPT immediately asked relevant follow-ups, querying any changes in the pet’s environment and the duration of symptoms. The bot didn’t rush to solutions but showed patience, encouraging the user to share more. When asked for concrete solutions, its recommendations included calming treats and supplements, pheromone diffusers, and collars—all widely endorsed by experts and verifiable through leading veterinary sources such as the American Veterinary Medical Association (AVMA) and well-established pet advice outlets.
Its real strength, however, was in delivering information conversationally: validating Whitney’s worries, echoing supportive phrases, and demonstrating active listening cues ("That's understandable" or "It's great that you're paying attention to Mr. Giggles' well-being"). This sort of emotional scaffolding echoes what researchers in human-computer interaction have found to be most critical for building trust in AI assistants.

Technical Observations​

  • Voice Personalization: Multiple choices, natural-sounding synthesis, easy to configure within the app.
  • Conversational Intelligence: Recognizes context, asks follow-up questions, adapts its suggestions.
  • Advice Accuracy: Recommendations align with current veterinary best practices.

Critical Analysis​

Strengths:
  • Offers a near-human level of understanding and support.
  • Proactively elicits additional user context for more tailored responses.
  • Tone is supportive and confidence-inspiring.
Potential Risks:
  • Possibility of over-reliance: Users may perceive emotional support as a substitute for direct veterinarian intervention.
  • While suggestions are responsible, there’s always a chance of outdated advice if the AI's data sources aren't continuously refreshed.

Microsoft Copilot: The Reassuring Analyst​

Microsoft Copilot’s performance was emblematic of Microsoft's growing emphasis on responsible AI and proactive problem-solving. Setup was similarly straightforward: users could select among multiple voices (Whitney again preferred a British accent, named 'Wave').
Copilot’s conversational style echoed ChatGPT's, with gentle follow-ups and a focus on user comfort ("We’ll figure this out"). It probed for environmental triggers and closely mirrored ChatGPT's advice on remedies—pheromone diffusers, calming treats, and supplements. Notably, Copilot seemed especially adept at providing reassurance and structuring the dialogue in a logical, step-wise way.

Technical Observations​

  • Voice Personalization: Robust, with multiple, high-quality voice selections.
  • Conversational Depth: Good at probing for triggers and guiding toward logical next steps.
  • Advice Quality: Echoes mainstream veterinary recommendations, though with less nuance than ChatGPT.

Critical Analysis​

Strengths:
  • Clear and compassionate, instilling trust.
  • Structured prompts make it easy to follow actionable steps.
Potential Risks:
  • Slightly less spontaneous than ChatGPT, with responses sometimes feeling scripted.
  • May miss nuances if the user does not explicitly volunteer detailed context.

Google Gemini: The Information Specialist, Struggling with Tone​

Google’s Gemini, despite its technical prowess, revealed a limitation in replicating warmth and empathy in conversation. Whitney configured a voice called 'Capella,' described as soothing and slightly British.
Gemini's responses were accurate and grounded—citing environmental changes, suggesting familiar remedies like pheromone diffusers, calming collars, music, and toys. The advice matched that of ChatGPT and Copilot, and could be independently verified via pet health authorities and animal behaviorists.
The weakness: Gemini’s answers were concise to the point of being terse. There was little emotional validation, and its tendency toward brief, closed-ended replies made the conversation feel transactional. This observation is consistent with several independent reviews of Gemini’s consumer interface: while Google’s underlying models are powerful, the front-end conversation UX still lags behind the competition in emotional resonance.

Technical Observations​

  • Voice Personalization: Satisfactory variety, clear audio.
  • Conversational Style: Brief, informative, sometimes requiring more user prompting to draw out detail.
  • Advice Accuracy: Consistently aligns with best practices for pet anxiety.

Critical Analysis​

Strengths:
  • Fast, precise answers based on a deep knowledge base.
  • Minimal risk of hallucinated (fabricated) advice.
Potential Risks:
  • Lacks the human touch, with minimal sympathy or conversational engagement.
  • Users seeking comfort or detailed troubleshooting may find interactions unsatisfying or shallow.

Meta AI: The Practical Problem-Solver​

Meta AI, available via a stand-alone app with multiple celebrity and neutral voice choices, showed a slightly different character—leaning towards efficient, practical engagement. Whitney selected a Kristen Bell-inspired voice, adding a distinct flourish.
Meta AI excelled in keeping the exchange brisk while providing concise, detailed responses. Its conversational structure allowed for effective back-and-forth, and like Gemini, its advice was factually on-point—suggesting specific calming techniques and environmental modifications.
Meta AI’s key differentiator was in the user interface: spoken words appeared on-screen in real-time, and users could seamlessly refer back to transcripts after the conversation. While slightly less overtly sympathetic than ChatGPT or Copilot, Meta AI's delivery was measured and professional—suited for users prioritizing actionable answers over emotional hand-holding.

Technical Observations​

  • Voice Personalization: Unique celebrity mimicry options, adding novelty.
  • Conversational Agility: Balances brevity and detail, with accessible transcript features.
  • Advice Quality: Matches veterinary-advised interventions.

Critical Analysis​

Strengths:
  • Efficiency and clarity in conversational pacing.
  • Excellent transcript and UI features.
Potential Risks:
  • Minimal empathy cues might alienate some users.
  • Use of celebrity voices could inadvertently trivialize sensitive conversations.

Grok: The Over-Eager Lecturer​

Developed with a distinctly different philosophy, Grok positions itself as adaptable and customizable based on user preference. Unlike others, users can tweak not just voice but also the character of Grok’s responses (concise, formal, Socratic, or custom). Whitney asked Grok to be supportive and understanding.
During the cat-anxiety scenario, Grok delivered a barrage of advice in one extended stretch, resembling a Q&A session rather than a typical conversation. While the suggestions were research-backed and highly usable—calming treats, environmental modifications, and behavioral tweaks—the lack of natural conversational rhythm diminished the sense of true engagement. Nonetheless, transcripts and voice feedback made it easy to review the exchange afterward.
Grok’s overall style felt more ‘broadcast’ than interactive, which could be overwhelming for users looking for short, actionable insights within a dialogue.

Technical Observations​

  • Voice Personalization: Limited compared to others; default voice only.
  • Customization: High—users can define response style.
  • Advice Quality: Sound but occasionally dense, requiring user effort to parse.

Critical Analysis​

Strengths:
  • Depth and comprehensiveness of advice.
  • Extensive customization potential.
Potential Risks:
  • May overwhelm with too much information at once.
  • Lacks the back-and-forth cadence critical for conversational satisfaction.

Synthesis and Implications for Next-Gen Conversational AI​

Whitney’s experiment highlights a crucial evolution in AI: as these systems become ethically and practically embedded into daily life, users demand not just correct answers but relatable companionship. The five AI bots, while all technically sound in their recommendations for pet anxiety—a scenario easily cross-verified with veterinary authorities—varied significantly in their human-centric qualities.
ChatGPT emerged as the most effective at blending intelligence with relatability, setting the current benchmark for conversational empathy. Microsoft Copilot provided a reassuring, analytical touch, while Meta AI balanced speed and practical content. Gemini proved reliable but emotionally muted, and Grok, despite its abundance of advice, struggled with pacing and engagement.

Comparative Table: Conversational AI Performance for the Mr. Giggles Test​

BotEmpathy & ToneAdvice AccuracyPersonalizationUI FeaturesPacing/EngagementBest For
ChatGPT★★★★☆★★★★★★★★★★★★★★☆★★★★★Users seeking support & nuanced chat
Copilot★★★★☆★★★★★★★★★★★★★★☆★★★★☆Trust-building, stepwise guidance
Gemini★★☆☆☆★★★★★★★★★☆★★★☆☆★★☆☆☆Direct, fact-driven answers
Meta AI★★★☆☆★★★★★★★★★★★★★★★★★★☆☆Practical tasks, transcript review
Grok★★☆☆☆★★★★★★★★★☆★★★★☆★★☆☆☆Deep dives, custom response styles

The Bigger Picture: Why Tone and Trust Are Now Table Stakes​

While functionality and data integrity remain foundational, the next frontier for AI will be in the subtle art of conversation. This means not just knowledge, but warmth; not just advice, but encouragement. Researchers and developers will need to optimize for active listening, adaptive tone, and even cultural nuances—elements that will become decisive in whether users embrace or abandon AI aids long-term.
The risks of over-personalization or false security, however, cannot be ignored. No chatbot should substitute for medical, legal, or expert human intervention when serious decisions are at stake. Transparent disclosures, regular updating of information sources, and clear escalation prompts (“consider consulting your vet if symptoms persist”) must be non-negotiable features as conversational AIs proliferate.

Final Reflections: The Growing Role of Conversational AI in Everyday Life​

If Whitney’s practical test is any indication, the best conversational AI is neither the most verbose, nor the most fact-stuffed, but the one that listens and guides just as a trusted friend might. In a crowded market, this intangible asset—genuine-sounding empathy—may soon separate the enduring virtual companions from the digital also-rans.
Ultimately, as AI continues to weave itself into the fabric of our daily lives, the conversation is not just about what these bots can do for us, but how they make us feel along the way. The future of AI will be written not only with code—but with empathy, humility, and a remarkably human touch.

Source: ZDNET I chatted with five AI bots - these made the best conversations