Why AI Projects Fail and What are the Solutions

EXECUTIVE SUMMARY

AI has become one of the most transformative forces in modern business, unlocking unprecedented gains in
efficiency, insight, and customer experience. When applied correctly, it accelerates decision-making, reduces
operational costs, and creates entirely new competitive advantages. Organizations that embrace AI are no
longer asking if they should adopt it, but how fast they can scale it.

However, AI is not inherently successful by default—its impact depends entirely on how it is implemented.
Without proper readiness, integration, and alignment to business outcomes, AI investments can fail to
deliver value or even create new risks.

Part 1 of this paper will provide a market research analyzing why AI projects fail, followed by Part II that offers
solutions to “de-risk” the AI projects.

Part 1 — Our market research examines those failures to highlight a critical truth: AI is essential—but only when
done right.

The best-documented enterprise AI failures were usually not failures of “AI” in the abstract. They were failures of
organizational readiness, system design, and objective-setting. The recurring pattern for failures was that:

Companies deployed models or chatbots into workflows that had not been sufficiently prepared for messy data, policy inconsistency, exception handling, human override, or regulatory risk
Treated AI as a point solution rather than as part of an orchestrated operating system– Focused on and optimized the dispersed tasks, such as scoring a performance, instead of the actual business outcome that matters the most.

Our research of 14 documented cases classified across the following three failure modes:

Failure Mode A — No AI-Readiness Evaluation
This category covers IBM Watson for Oncology ($62M loss), Amazon's discriminatory recruiting AI (abandoned after 3 years), Air Canada's chatbot (legal liability, decommissioned), DPD's chatbot (viral brand crisis), and iTutor Group
($365,000 EEOC settlement).

IBM's Watson case became a cautionary tale about rushing into AI without proper assessment — by 2021, internal
documents revealed it was providing "unsafe and incorrect" cancer treatment advice, having been trained on data from a single institution and unable to adapt to different healthcare contexts.

Failure Mode B — Siloed AI, Not Orchestrated - Includes McDonald's/IBM drive-thru AI (terminated 2024),
Taco Bell/Yum! Brands voice AI, NYC's MyCity chatbot (giving illegal advice), and 38 hospital systems analyzed
by JMIR. MIT's findings confirm: most AI tools fail to learn over time and remain poorly integrated into day-to-day
workflows — and businesses that attempted to build AI tools entirely in-house were twice as likely to fail as those
that relied on external platforms.

Failure Mode C — Insights-Only, No Business-Outcome Generation
Zillow’s iBuying algorithm ($500M+ loss, 25% workforce cut), UnitedHealth’s nH Predict (90% error rate on Medicare
denials, DOJ inquiry, class-action lawsuit), and enterprise AI broadly per MIT NANDA.

The most successful AI tools in the consumer market are often the least suited for business impact — a pattern that
extends to enterprise deployments, where AI generates reports and recommendations but stops there, lacking the
mechanisms to deliver on business outcomes.

Part II — Following our market research, this paper provides solutions to “de-risk” the AI projects, focusing on
the three common problems:– AI Readiness– Siloed AI Tools– Insights, but no Outcome This section explores the limitations of traditional AI deployments, the emergence of agentic AI, and how organizations can transition from fragmented AI tooling to AI-native operational ecosystems that deliver measurable business outcomes.

It shows that despite significant investments; many organizations continue to struggle to achieve the anticipated
return on investment (ROI) from AI initiatives. While AI tools often generate valuable insights, recommendations,
and automation opportunities, they frequently fail to produce consistent business outcomes.

The primary challenge is not intelligence itself—it is the operational complexity of integrating, managing, and
acting upon AI-generated outputs.

This challenge has given rise to a new strategic paradigm: Un-Tooling AI™.

Introduced by OnviSource, Un-Tooling AI represents a shift from treating AI as a collection of software tools to
treating it as a coordinated workforce of conversational, agentic, and outcome-driven virtual teammates that
collaborate naturally with humans to achieve business objectives while learning and evolving, much like live
employees.

Another compelling aspect of the Un-Tooling AI strategy is that it provides a practical pathway to transform
traditional Business Process Outsourcing (BPO) into Business Function Outsourcing (BFO), in which
providers are accountable not merely for executing tasks but for delivering measurable business outcomes.

PART ONE - Why AI Projects Fail

COMPANIES, ROOT CAUSES, LESSONS

The Macro Picture Is Stark:

Companies invested $47 billion in AI initiatives in the first half of 2025, yet 89% saw minimal or no returns, and 42% scrapped most of their AI initiatives that same year.

A documented analysis of real-world AI deployment failures, classified across three systemic failure modes:
Absent readiness evaluation, siloed tool investment, and insights-only orientation without business-outcome
generation.

89%

of AI investments produced minimal or no returns in 2025 (CMSWire)

42%

of companies scrapped most AI initiatives in 2025, up from 17% in
2024 (S&P Global)

95%

of enterprise GenAI pilots deliver no measurable business impact (MIT NANDA, 2025)

All cases: A — No AI-readiness evaluation, B — Siloed AI tools, and C — Insights without outcomes

FAILURE MODE A

No AI-Readiness Evaluation

FAILURE CASE A — NO AI-READINESS EVALUATION
IBM Watson for Oncology / MD Anderson Cancer Center 2012–2021

IBM and MD Anderson launched Watson for Oncology with a vision to democratize cancer expertise globally. No AI-readiness assessment was performed: the system was trained exclusively on hypothetical patient data from a single institution, making it unable to adapt to different healthcare contexts or real-world patient variation. Internal documents revealed the system was providing “unsafe and incorrect” cancer treatment advice — including recommending blood-thinning drugs for patients already experiencing severe bleeding. The project was eventually sold off quietly, a casualty not of AI capability limits but of absent pre-deployment evaluation and preparation.

Outcome: $62 million loss at MD Anderson alone. Patient safety risks. Project abandoned.

FAILURE CASE A — NO AI-READINESS EVALUATION

Amazon — AI Recruiting Tool 2014–2018

Amazon assembled a team in Edinburgh to automate hiring through machine learning. No data-readiness or bias audit was conducted before training. The model was trained on a decade of historical résumé data drawn from Amazon’s existing tech workforce, which was predominantly male. The AI learned to systematically downgrade female candidates, penalizing résumés that included words like “women’s” (as in “women’s chess club”). Engineers tried to correct the bias but could not guarantee the model would not develop new discriminatory filters. Amazon scrapped the project entirely in 2018.

Outcome: Project abandoned. Legal and reputational exposure. Three years of engineering investment wasted.

FAILURE CASE A — NO AI-READINESS EVALUATION

Amazon — AI Recruiting Tool 2014–2018

Outcome: Project abandoned. Legal and reputational exposure. Three years of engineering investment wasted.

FAILURE CASE A — NO AI-READINESS EVALUATION

Air Canada — Bereavement Chatbot 2022–2024

Air Canada deployed a generative AI chatbot to handle customer service without validating that its responses would be consistent with the airline’s published policies. The chatbot incorrectly advised a grieving passenger that he could retroactively apply for a bereavement fare discount after travel — directly contradicting the airline’s own website. When the customer sued, Air Canada attempted to argue the chatbot was a “separate legal entity” responsible for its own actions. A Canadian Civil Resolution Tribunal rejected this, finding Air Canada fully liable. The chatbot was quietly removed from the website in April 2024.

Outcome: Legal precedent set. Reputational damage. Chatbot decommissioned. Hallucination rates in similar deployments: 3–27% (NYT).

FAILURE CASE A — NO AI-READINESS EVALUATION

DPD — Customer Service Chatbot 2024

French logistics company DPD deployed an AI chatbot as part of a routine system update, without adequate testing or guardrails. A customer prompted the bot to abandon its scripted behavior, and it began issuing responses containing inappropriate language and direct criticism of the company itself. The incident went viral on social media within 24 hours, garnering over 800,000 views. DPD had failed to evaluate AI behavioral readiness before deploying a public-facing system — no stress testing, edge-case evaluation, or content guardrail assessment had been completed.

Outcome: 800,000+ views of damaging content within 24 hours. Brand reputation crisis. Chatbot emergency shutdown.

FAILURE CASE A — NO AI-READINESS EVALUATION

iTutor Group 2023

EdTech company iTutor Group deployed an AI-based tutor hiring system without evaluating the data and demographic biases embedded in the model. The system was found to automatically reject applicants over a certain age, constituting age discrimination. The EEOC (Equal Employment Opportunity Commission) investigated. iTutor settled for $365,000 in 2023 — a direct consequence of launching an AI hiring tool without an ethics or fairness readiness review.

Outcome: $365,000 federal settlement. EEOC enforcement action. Regulatory scrutiny.

FAILURE MODE B

Siloed AI Tools, Not Orchestrated

FAILURE CASE B — SILOED AI TOOLS, NOT ORCHESTRATED
McDonald’s & IBM — Drive-Thru Voice AI 2021–2024

McDonald’s partnered with IBM to deploy AI-powered voice ordering at over 100 US drive-thru locations. The voice AI was a standalone, siloed solution disconnected from broader order management, inventory, customer context, and conversational state systems. The AI misinterpreted orders in noisy environments, could not maintain context across a single interaction, and repeatedly upsold items already ordered. In one viral incident, a customer was entered for 260 Chicken McNuggets. In another, the AI added unwanted bacon to an ice cream order. In June 2024, McDonald’s formally ended the partnership — citing the system’s inability to function reliably without integration into a unified operational platform.

Outcome: Partnership terminated after 3 years. Viral brand damage. Demonstrated the cost of point-solution AI over orchestrated deployment.

FAILURE CASE B — SILOED AI TOOLS, NOT ORCHESTRATED

Taco Bell / Yum! Brands — Voice AI Drive-Thru 2023–2024

Yum! Brands piloted an AI voice-ordering system at Taco Bell drive-thrus, expanding to over 100 locations across 13 states by mid-2024. Like the McDonald’s case, the system was siloed from inventory context, menu logic, and conversational history. It repeatedly misinterpreted orders in noisy, real-world environments. A customer famously “ordered” 18,000 cups of water when the AI failed to recognize contextual limits. The system also repeatedly upsold items customers had already ordered, revealing the absence of session-level state management — a problem only solvable with system-wide orchestration rather than a bolt-on voice tool.

Outcome: Customer frustration. Viral incidents. Continued investment required to address structural integration failures.

FAILURE CASE B — SILOED AI TOOLS, NOT ORCHESTRATED

Microsoft / New York City — MyCity Chatbot 2024

New York City launched a Microsoft-powered AI chatbot called MyCity in October 2023, designed to advise entrepreneurs on business regulations, housing policy, and worker rights. The chatbot was deployed as a siloed information tool, disconnected from authoritative legal and regulatory databases. By March 2024, The Markup reported that MyCity was providing factually incorrect guidance that would lead small business owners to violate the law. The siloed nature of the AI — isolated from live regulatory sources and not integrated into city legal workflows — was the core failure. A standalone AI chatbot providing high-stakes legal guidance without integration to verified data sources is categorically unfit for purpose.

Outcome: Incorrect legal advice to the public. Regulatory risk for NYC businesses following the AI’s guidance.

FAILURE CASE B — SILOED AI TOOLS, NOT ORCHESTRATED

Hospital Systems (38-system JMIR Analysis) 2024

A 2024 JMIR review covering 38 hospital systems found that AI implementations in healthcare consistently created more manual work and clinical alert fatigue rather than relief. The pattern: hospitals bought population health AI dashboards, predictive analytics tools, and documentation AI as disconnected point solutions — each siloed from the other. Nurses were still transcribing notes manually while administrators reviewed AI-generated population dashboards with no way to act on them. Organizations started with predictive analytics before fixing documentation infrastructure. The sequence was reversed — and the lack of orchestrated, workflow-integrated AI made each tool generate noise rather than operational value.

Outcome: Increased clinical workload despite investment. Alert fatigue. Patient care workflow degradation. ROI near zero.

FAILURE MODE C
Insights-Only, No Business-Outcome Generation