The Elizabeth Glaser Pediatric AIDS Foundation (EGPAF) has moved from fragmented program data to an AI-powered, conversational analytics assistant — glAIser — built on a unified lakehouse called Glaser 360 running on Microsoft Azure. The system centralizes disparate health-program records, uses Azure Data Factory and Azure Data Lake for ingestion and storage, transforms and models program data with Azure Databricks, and applies a supervised retrieval-augmented generation (RAG) pipeline that combines Azure AI Search with Azure OpenAI Service to let non-technical staff ask natural-language questions and receive grounded, exportable charts, written summaries, and trend takeaways. The result is an operational shift: program teams, clinicians, and policymakers can query the latest consolidated metrics — from numbers of children tested by country and sex to longitudinal transmission measures — without waiting for specialist analysts, while EGPAF plans to embed similar capabilities into partner government systems to accelerate public health decision-making.
EGPAF operates in multiple high-HIV-burden countries where program data historically sat in many different systems. That fragmentation delayed analysis and limited immediate use of evidence for program decisions. To fix that, EGPAF built a consolidated data lakehouse named Glaser 360 where pipelines ingest and anonymize data from each local system into Azure Data Lake, ETL jobs run in Azure Databricks, and final reporting and visualization are handled via Power BI. When organizational capacity for advanced data mining remained limited, EGPAF, Microsoft, and partner Squadra Digital developed glAIser, an Azure AI–based assistant that uses RAG patterns to match natural-language prompts to the right datasets and metrics, then generates charts, written summaries, and recommended takeaways for decision makers. This architecture combines cloud scale, a governed data layer, and agent orchestration to democratize analytics across the nonprofit.
However, the benefits come with measurable obligations: robust privacy measures, explicit human oversight for clinical or policy inferences, cost governance, and careful safety engineering are mandatory. Technical guardrails, such as restricting retrieval sources to curated indexes, applying content-safety filters, and surfacing citations to source datasets, will help, but they do not absolve organizations from validating high-stakes outcomes through established review channels. Microsoft’s RAG guidance, Databricks lakehouse patterns, and emerging Azure safety tooling provide a sensible technical foundation — but operational and legal diligence will determine whether the model becomes a reliable, long-term asset for public health.
EGPAF’s early work demonstrates a compelling path: cloud-scale analytics and conversational AI can elevate program responsiveness without requiring every staff member to be a data scientist. With deliberate governance, clear human review processes, and cost-aware engineering, similar organizations can replicate that value and improve the speed and quality of decisions that affect vulnerable populations.
Source: Microsoft Elizabeth Glaser Pediatric AIDS Foundation speeds prevention and treatment with Azure AI solution | Microsoft Customer Stories
Background / Overview
EGPAF operates in multiple high-HIV-burden countries where program data historically sat in many different systems. That fragmentation delayed analysis and limited immediate use of evidence for program decisions. To fix that, EGPAF built a consolidated data lakehouse named Glaser 360 where pipelines ingest and anonymize data from each local system into Azure Data Lake, ETL jobs run in Azure Databricks, and final reporting and visualization are handled via Power BI. When organizational capacity for advanced data mining remained limited, EGPAF, Microsoft, and partner Squadra Digital developed glAIser, an Azure AI–based assistant that uses RAG patterns to match natural-language prompts to the right datasets and metrics, then generates charts, written summaries, and recommended takeaways for decision makers. This architecture combines cloud scale, a governed data layer, and agent orchestration to democratize analytics across the nonprofit.How the pipeline is built: data to insight
Ingest and storage: Azure Data Factory + Azure Data Lake
EGPAF’s Glaser 360 uses Azure Data Factory pipelines to extract and copy data from government systems and bespoke applications into a central object store — Azure Data Lake Storage Gen2. Azure Data Factory supports scheduled and incremental copy activities and more than 80 connectors, which makes it a standard choice for moving heterogeneous clinical and program data into cloud storage. This foundation enables the lakehouse to receive new records continuously while keeping operational systems untouched.- Benefits:
- Scalable connectors to many on-prem and cloud sources.
- Incremental, idempotent transfers for reliable ingestion.
- Centralized staging that simplifies downstream transformations.
Transform and model: Azure Databricks lakehouse
Once raw data lands in the lake, Azure Databricks provides ETL/ELT, data-quality rules, table management (Delta Lake), and transformation pipelines that prepare analytics-ready datasets and measures (for example, child HIV testing counts, disaggregated by sex and country). Databricks’ lakehouse model unifies batch and streaming workloads, supports collaborative notebooks for data engineering, and integrates with Power BI and other reporting tools — a typical production architecture for enterprise analytics and AI.- Key platform capabilities used in this design:
- Delta Lake (ACID transactions, schema enforcement).
- Auto Loader / Lakeflow Declarative Pipelines for repeatable ingest.
- Unity Catalog for governance and access control when required.
Retrieval and generation: Azure AI Search + Azure OpenAI Service (supervised RAG)
At the application level, glAIser uses a supervised retrieval-augmented generation (RAG) approach that first maps user prompts to a structured intent and then retrieves the precise slices of consolidated data to ground responses. Azure AI Search functions as the vectorized retrieval layer and knowledge index; it returns candidate records, time series, and metadata. The selected context is passed to an LLM hosted through Azure OpenAI Service, which synthesizes a narrative answer, generates charts and tables, and produces key takeaways. Azure AI Search and Azure Learn documentation describe this RAG pattern and best practices for hybrid keyword/vector search, relevance tuning, chunking, and supplying only authoritative retrievals to the LLM so generated outputs remain grounded.- Why supervised RAG matters here:
- Limits hallucination by constraining the model to vetted program data.
- Enables interactive, reproducible outputs with citations to the underlying metrics.
- Supports multi-agent orchestration: intent classification chooses the correct data-querying agent depending on the prompt.
Presentation and last-mile: Power BI and exportable assets
Outputs from the AI agent aren’t limited to prose — glAIser returns interactive and exportable charts and spreadsheets for immediate use in communications, fundraising, and policy briefs. For many NGOs, surfacing export-ready visuals in the same flow as an analytic summary shortens the timeline from insight to action, especially when non-technical teams must quickly produce program updates or evidence for donors.What the solution delivers in practice
- On-demand interpretive analytics: Staff can ask natural-language questions and receive not just numbers but interpretation — trend descriptions, anomalies, and actionable takeaways.
- Democratized data use: Clinicians, program officers, and policymakers can independently query program results without going through a central analytics backlog.
- Faster decision cycles: Near-real-time analysis of consolidated program measurements reduces the lag between data collection and programmatic adaptation.
- Extensibility: Because Glaser 360 is a lakehouse, EGPAF can onboard new countries, datasets, and indicators without rearchitecting ingestion or compute layers.
Why the architecture is sensible (strengths)
1) Clear separation of concerns: storage, compute, retrieval, and generation
Splitting responsibilities among Azure Data Lake (durable object store), Databricks (ETL/feature engineering), Azure AI Search (retrieval index), and Azure OpenAI (LLM) is an established enterprise pattern for RAG systems. It offers scalability, observable SLAs at each layer, and the ability to tune or replace components independently. Microsoft documentation and Databricks reference architectures explicitly recommend these building blocks for lakehouse + AI solutions.2) Data governance and compliance fit an NGO operating with public-health partners
Using a central lakehouse with catalog and access controls (e.g., Unity Catalog / Purview integrations on Azure) makes it feasible to apply role-based access, lineage, and retention policies — crucial when data is sensitive or collaboration involves government partners. Databricks and Azure both provide enterprise-grade governance toolsets that align with these needs.3) Practical RAG design reduces hallucination risk
By grounding generative outputs on indexed, curated program data returned by Azure AI Search, and by applying supervised agent selection, glAIser reduces the model’s need to guess or invent facts. Microsoft’s RAG guidance describes hybrid search and prompt engineering techniques that emphasize returning only authoritative snippets for the model to synthesize, which is the right approach for healthcare program metrics.4) Rapid enablement through a partner-led hackathon model
EGPAF’s use of a Microsoft-run hackathon and a partner like Squadra Digital to combine health expertise, security practice, and AI engineering is an efficient route for nonprofits that lack large in-house data science teams. Short, focused sprints plus prior groundwork can produce usable prototypes and concrete adoption plans.Risks, limits, and governance considerations
While the architecture offers tangible gains, several important risks deserve attention and mitigation:A. Hallucinations and overconfidence in AI outputs
Even with RAG, generative models can overreach. Guardrails and grounding reduce hallucination but do not eliminate it; independent vetting and human-in-the-loop checks remain essential, especially when analytics inform clinical or policy decisions. Platforms are adding "correction" and grounding detection features, but these are supplementary controls — not substitutes for domain validation. Organizations must design workflows that require verification for high-stakes outputs.B. Data privacy, anonymization, and legal compliance
Centralizing program data, even anonymized, triggers legal and ethical obligations. When EGPAF expands access to governmental systems, each jurisdiction’s data protection laws, public health data rules, and expectations for residency and consent need to be honored. Operationalizing anonymization and minimizing re-identification risk must be built into ingest, indexing, and retrieval (e.g., PII detection, aggressive aggregation thresholds). These are non-technical policy decisions as much as engineering tasks.C. Single-vendor dependency and lock-in dynamics
glAIser’s implementation depends substantially on Azure-managed services (Data Factory, Data Lake, Databricks on Azure, Azure AI Search, Azure OpenAI Service). That simplifies integration and governance but creates operational coupling: changes to pricing, service availability, or model-hosting policies at the cloud provider will directly affect EGPAF’s analytics. Project teams should budget for portability planning and contracts that include SLAs and export guarantees where possible. Microsoft and partner platforms offer enterprise agreements and migration patterns, but organizations must weigh vendor risk against time‑to‑value.D. Operational cost and sustainability for a nonprofit
Cloud compute, Databricks DBUs, vector index storage, and inference calls to LLMs all add recurring costs. NGOs with modest budgets must plan for predictable consumption (e.g., caching, query limits, query pre-filtering) and consider lower-cost inference modes for routine tasks. A cost governance plan — tagging, budgets, and optimizations — is critical to maintain sustainability.E. Model safety and moderation for public-facing deployments
If EGPAF intends to embed glAIser into government health systems, the solution must include content safety, moderation, and explicit fail-open/closed behaviors. Microsoft and other cloud vendors provide content safety tooling and model safety rankings, but these controls require careful configuration for multilingual and cultural contexts common across EGPAF’s countries of operation. Public-sector deployments also raise the bar for auditability and explainability of outputs.Practical recommendations for others building similar health-data assistants
- Design for progressive disclosure: start by exposing program-level aggregates (counts, rates, trends) to the AI assistant, and add more granular access only after privacy and governance controls are proven.
- Instrument with metrics and human review flows: log every AI-derived insight and pair it with a required human sign-off for high-impact decisions.
- Budget for ongoing costs and optimizations: adopt burstable compute for heavy ETL jobs and low-cost inference tiers for routine queries.
- Run safety and grounding tests in production-like settings: measure hallucination rates, false positives, and content-safety false alarms across languages and datasets.
- Contract for portability and data egress scenarios: ensure you can export both raw and transformed data in open formats if you later choose a multi-cloud or on-prem strategy.
What to watch next: adoption and scaling signals
- Embedding glAIser-like agents into government electronic health systems will require multi-party governance agreements and feature parity across local EHRs and registries. If EGPAF follows through, it could accelerate data-driven public health decisions by shortening the feedback loop between field data and policy adjustments.
- The platform model used by EGPAF — lakehouse + RAG + LLM — is becoming mainstream for organizations that need conversational access to operational data. Vendors continue to invest in content safety, groundedness detection, and agent orchestration to reduce hallucinations and improve trust. Keep an eye on new enterprise tools that automate grounding validation and provide higher transparency into how an LLM used retrieved contexts to form conclusions.
Final assessment: transformative potential with conditional risk
glAIser and Glaser 360 illustrate a pragmatic, modern architecture for making health-program data useful at the point of decision. By combining a governed data lakehouse, managed ETL and compute on Azure Databricks, and a supervised RAG conversational layer built on Azure AI Search and Azure OpenAI Service, EGPAF has accelerated access to actionable insights across program, communications, and policy teams. The most striking benefit is democratization: non-technical users can obtain interpreted analytics quickly, which can meaningfully shorten the time between data collection and program improvement.However, the benefits come with measurable obligations: robust privacy measures, explicit human oversight for clinical or policy inferences, cost governance, and careful safety engineering are mandatory. Technical guardrails, such as restricting retrieval sources to curated indexes, applying content-safety filters, and surfacing citations to source datasets, will help, but they do not absolve organizations from validating high-stakes outcomes through established review channels. Microsoft’s RAG guidance, Databricks lakehouse patterns, and emerging Azure safety tooling provide a sensible technical foundation — but operational and legal diligence will determine whether the model becomes a reliable, long-term asset for public health.
EGPAF’s early work demonstrates a compelling path: cloud-scale analytics and conversational AI can elevate program responsiveness without requiring every staff member to be a data scientist. With deliberate governance, clear human review processes, and cost-aware engineering, similar organizations can replicate that value and improve the speed and quality of decisions that affect vulnerable populations.
Source: Microsoft Elizabeth Glaser Pediatric AIDS Foundation speeds prevention and treatment with Azure AI solution | Microsoft Customer Stories