Artificial intelligence is moving from proof-of-concept to point-of-care across nephrology — risk prediction, image and biopsy interpretation, dialysis optimization, transplant decision support, and now generative tools at the bedside. Yet, as of this writing, no major society (KDIGO, ASN, ERA) has issued a dedicated AI practice guideline. This guide fills that gap with a pragmatic, physiology-grounded, evidence-anchored perspective for the practicing nephrologist — an editorial that organizes the current evidence and offers pragmatic guardrails until formal guidance exists.

Purpose & Scope

What This Guide Is — and What It Is Not

This is a clinician-facing perspective: editorial and integrative, not a disease-specific patient handout, not a systematic review, and not a technical AI primer. Consistent with an integrative-functional approach to nephrology, AI is framed here as an amplifier of physiologic reasoning and the therapeutic toolkit — not a replacement for them. Every use case is tied back to mechanism and to cross-organ (kidney–cardiovascular–metabolic) integration.

On completing this guide, the reader will be able to: define and distinguish machine learning, deep learning, and large language models, and map each to concrete nephrology tasks; interpret common performance metrics (AUROC, calibration, sensitivity at a chosen alert threshold, lead time) and their clinical trade-offs; critically appraise an AI study using a structured checklist (data provenance, external validation, calibration, fairness, prospective evidence); identify the highest-yield, evidence-supported use cases by domain (AKI, CKD progression, pathology, dialysis, transplant); use generative AI safely for documentation, literature synthesis, and patient education while avoiding hallucination and confidentiality pitfalls; and apply governance, bias-mitigation, and regulatory principles — including the eGFR race-coefficient debate and software-as-a-medical-device pathways.

💡

The editorial throughline

AI should sharpen pathophysiologic reasoning and therapeutic precision, not substitute for them. A prediction is only useful when it routes to a physiologically rational, guideline-aligned action.

Patient-mode educational poster showing where AI shows up across a kidney-care journey. Horizontal patient-journey band runs left to right from diagnosis to long-term care, with five small cards spaced along it: a risk estimate that helps the doctor decide visit frequency and treatments; an early-warning alert in the hospital that flags possible kidney decline so the team can act before it happens; a biopsy assistant that helps the pathologist measure the kidney biopsy more reliably; dialysis decision support that helps the team set safer fluid and dose targets; and a clinic notes and education helper that drafts paperwork and patient handouts which the doctor then signs. Bottom green safety line reads: In every box, the AI is helping. Your nephrologist still decides. — Patient-facing companion · shareable
Patient-facing companion figure for sharing in the clinic or on social media — where AI typically appears across kidney care, with the line that says the clinician still owns the decision.

The guide is organized into eight modules. Each module carries the same internal structure: objective → mechanistic framing → key content → clinical decision point → evidence anchors → caveats. The build map below is the orientation; bracketed numbers are reference IDs in the bibliography at the end of the guide.

Mod	Module	Core question it answers	Evidence anchors
1	Foundations & taxonomy	What is AI / ML / DL / LLM, and why nephrology now?	^4,1,2
2	Decision support & CKD risk	Who will progress, and how should that change management?	^1,14
3	Acute kidney injury prediction	Can we see AKI coming early enough to act?	^5,7,8,6,9,10
4	Digital pathology & imaging	Can algorithms read the biopsy / image and add signal?	^11,12,13
5	Dialysis optimization	Can AI improve volume, anemia, and PD decisions?	^15,16,17
6	Transplantation	Can AI improve matching, rejection, and dosing?	^18,19
7	Generative AI & LLMs	How do I use ChatGPT-class tools safely in practice?	^20,21,22
8	Governance, bias & regulation	How do I deploy this responsibly and lawfully?	^23,24,25,26

Module 1 · Foundations

Foundations & Taxonomy — a Mental Model Without the Engineering Jargon

Objective. Give the clinician a working mental model of AI methods without engineering jargon — enough to read a paper, interrogate a vendor, and ask the right second question.

Concentric nested rectangles showing artificial intelligence containing machine learning containing deep learning containing large language models. Each layer has a matching renal-task card on the right (KDIGO-aligned decision support, KFRE-style CKD risk prediction, histopathologic segmentation of glomeruli/tubules/interstitium, and drafting a discharge summary or patient handout with verification). — A nested mental model. AI ⊃ ML ⊃ DL ⊃ LLM — each layer maps to a concrete nephrology task. The renal-example cards on the right are what each layer actually does at the bedside.

Three nested ideas anchor the field. Machine learning (ML) is the broad family — patterns learned from data rather than hand-coded rules. Deep learning (DL) is the subset that uses multilayer neural networks, well suited to images and sequences (biopsy slides, dialysis waveforms, EHR time series). Large language models (LLMs) are deep networks trained on text via next-token prediction; their power and their pitfalls (hallucination, confidence without grounding) both flow from that single objective.

Three learning regimes are the second axis. Supervised learning trains on labeled examples — the canonical case is AKI label prediction from EHR features. Unsupervised learning finds structure without labels — for example, phenotype clustering of CKD that surfaces hidden subgroups within an apparently homogeneous KDIGO category. Reinforcement learning learns a policy from outcomes — dialysis-dosing or anemia-management agents are the archetype, and also the place where prospective validation lags the furthest behind the headlines.

Inputs that matter in nephrology cluster into four families: structured EHR labs and vitals, waveform and dialysis-machine telemetry, whole-slide histology, and free clinical text. Each maps to a different model family — gradient-boosted trees and tabular networks for the first, recurrent and transformer networks for the second, convolutional and vision-transformer networks for the third, language models for the fourth. The choice is not aesthetic; it shapes what the model can and cannot see.

Metrics literacy — why a high AUROC can still be useless at the bedside

Discrimination (AUROC) is the model's ability to rank a positive case above a negative one. Calibration is whether a predicted 30% means 30 patients out of 100 actually had the event — the property that decides whether a threshold means what you think it does. Clinical utility is net benefit at the threshold you would actually act on, and depends on prevalence, alert burden, and downstream action. Lead time and alert specificity decide whether the prediction arrives early enough — and cleanly enough — to change care rather than merely annotate it.

⚠️

Clinical decision point

Before trusting any model, ask three questions in order: (1) what was it trained to predict, (2) in whom was it trained, and (3) does its output arrive early enough and specifically enough to change an action? A discrimination metric in isolation answers none of these.

Evidence anchors: Loftus 2022 ¹ and Hueso 2024 ² for the field map; Cheungpasitporn 2024 ³ for critical-care nephrology framing; Filler 2022 ⁴ for the call-to-action framing in pediatric nephrology.

Module 2 · Decision Support

CKD Risk Stratification — from Population Risk to Action-Linked Prediction

Objective. Move from population risk to individualized, action-linked prediction of CKD progression.

Mechanistic framing. CKD progression integrates three axes: glomerular hemodynamics (intraglomerular pressure, single-nephron hyperfiltration), proteinuria-driven tubulointerstitial injury (the strongest modifiable driver of decline at any baseline eGFR), and cardiometabolic load (diabetes, hypertension, obesity). Multivariable ML extends the logic of the Kidney Failure Risk Equation (KFRE) by capturing nonlinear interactions among these axes — for example, how an elevated UACR amplifies risk far more steeply in the presence of poorly-controlled diabetes than in its absence.

From KFRE to ML — what the added complexity buys

KFRE remains the right tool for most outpatient stratification: it is parsimonious, externally validated, and easy to operationalize. Where ML adds discrimination is in cardiometabolic populations whose risk is dominated by nonlinear interactions the four- and eight-variable KFRE cannot fully capture. The worked exemplar is Klinrisk, validated within the CANVAS program and CREDENCE trial — externally validated ML for CKD progression in a cardiometabolic population, with discrimination meaningfully above KFRE in patients with type 2 diabetes ¹⁴. The headline is not that ML always wins; it is that ML wins where the underlying biology is most nonlinear, and the modeling choice should be driven by the population.

The decision a risk model should change is not diagnosis (the patient already has CKD) but therapeutic escalation timing and access planning: SGLT2-inhibitor and non-steroidal MRA intensification, nephrology referral timing, and vascular access planning all benefit from a numerically anchored future risk. KDIGO recommends KFRE-anchored referral thresholds; an ML risk score that beats KFRE in your population is a legitimate substitute for the same workflow, not a license to defer therapy until the score crosses an arbitrary line.

✅

Clinical decision point

Use validated risk output to escalate guideline-directed therapy and to time referral and access — not as a standalone prognosis delivered to the patient. A 30% 2-year risk routed to "do nothing differently" is a wasted prediction.

Evidence anchors: Tangri 2024 ¹⁴; framed within Loftus 2022 ¹.

Module 3 · AKI Prediction

Acute Kidney Injury Prediction — the Modifiable Window

Objective. Show where continuous, EHR-driven AKI prediction is mature enough to influence care — and where it is not.

Mechanistic framing. AKI is a final common pathway of hemodynamic, septic, nephrotoxic, and obstructive insults. Early prediction targets the modifiable window — perfusion, nephrotoxin exposure, and fluid strategy — before tubular injury becomes established and creatinine has risen. This window is what makes lead-time prediction valuable; it is also what makes alert burden so dangerous, because the model that fires 48 hours early is the same model that, miscalibrated, fires on half the ICU.

The landmark — and its honest caveats

The 2019 DeepMind continuous AKI model predicted 55.8% of inpatient AKI and 90.2% of AKI requiring dialysis up to 48 hours ahead — a step-change in lead time over creatinine-trigger systems ⁵. The honest caveats were published alongside the headline numbers and matter at least as much: two false alerts per true alert at the operating threshold, and a training dataset (US Department of Veterans Affairs) that was ~94% male. The first caveat is the alert-fatigue problem in numerical form; the second is the transportability problem made concrete.

Setting-specific models have followed: cardiac surgery AKI ⁶, sepsis-associated AKI with interpretable approaches that surface the contributing features rather than emitting a single opaque score ^7,8, and pediatric critical-care AKI ⁹ where the pre-AI baseline is least mature. Pickkers 2021 ¹⁰ is the indispensable physiology and management backdrop — the "why" without which an AKI alert is just a notification.

From alert to action — the bundle an AKI prediction should trigger

Reassess perfusion and volume. MAP, capillary refill, lactate, focused POCUS (IVC, VEXUS, lung B-lines, focused cardiac). The alert reframes "stable patient" into "patient at risk in the next 48 h" — the bedside exam catches up to the algorithm.

Review every nephrotoxin and every iodinated/gadolinium contrast plan. Aminoglycosides, NSAIDs, ACEi/ARB held judiciously around the insult, vancomycin troughs, planned contrast studies — defer, dose-adjust, or substitute where possible.

Recompute every renally-cleared dose. Antimicrobials, anticoagulants (LMWH, DOACs), gabapentinoids. Use an estimated trajectory — not yesterday's creatinine — when the alert says today's creatinine is the wrong denominator.

Set a fluid plan with a stop rule. Resuscitation when indicated, but with a pre-declared deresuscitation trigger (e.g., MAP > 65 sustained, lactate clearing, urine output recovering) so the same alert does not become a license to chase tertiary fluid overload.

Document a 4–6 h reassessment loop. The alert decays; the patient does not. Without a follow-up cadence in the order set, the algorithm has annotated the chart without changing care.

⚠️

Audit alert burden locally before adoption

The published positive-predictive value at any threshold is a function of the original population's AKI prevalence. In a lower-acuity unit, the same model will fire less specifically and the false-alert ratio will rise. Pilot on a quality metric (UF rate, nephrotoxin holds, dose-adjustment uptake) before letting it touch order sets — and define a kill-switch criterion in writing.

Five-step clinical-algorithm flowchart for an AI AKI alert turning into care: reassess perfusion and volume; review every nephrotoxin and contrast plan; recompute renally-cleared doses; set a fluid plan with a stop rule; document a 4-6 hour reassessment loop. A side panel lists honest caveats including approximately two false alerts per true alert at threshold, training-cohort sex skew, and the local PPV audit requirement. — From alert to action — the five-step bundle every AI AKI alert should route to. Without the bundle, the algorithm has annotated the chart without changing care.

Evidence anchors: Tomašev 2019 ⁵; Tseng 2020 ⁶; Yue 2022 ⁷; Fan 2023 ⁸; Dong 2021 ⁹; Pickkers 2021 ¹⁰.

Module 4 · Pathology & Imaging

Digital Pathology & Imaging — Reproducibility, Not Autonomy

Objective. Assess where computational pathology and imaging add reproducible signal to nephrology diagnosis — and where the right framing is augmentation rather than replacement.

Deep-learning histopathologic assessment of kidney tissue is the most mature image domain. Hermsen 2019 ¹¹ demonstrated automated segmentation of glomeruli, tubules, and interstitium on PAS-stained biopsies with performance approaching trained pathologists — turning what had been a qualitative impression ("mild to moderate IFTA") into a quantitative, reproducible measurement. The clinical value is less in diagnosing the unknown and more in reducing inter-reader variability for grading metrics that drive prognosis.

A perennial reproducibility barrier in digital pathology is stain variability: a model trained on one laboratory's slides degrades on another's. Bouteldja 2022 ¹² addressed this directly with stain-independent deep learning that generalizes across laboratories — an important step toward tools that survive the move from the research site to the community lab. The lesson generalizes: any pathology AI should be re-validated on the local lab's stains, scanners, and case mix before being trusted at the report level.

The non-invasive frontier is the oculo-renal axis: Meng 2025 ¹³ showed deep learning on retinal images can infer diabetic kidney disease at the population level. The mechanistic plausibility is real — the retina and the glomerulus share microvascular biology, and diabetic retinopathy and DKD co-occur far more often than chance — but the bedside application is screening, not biopsy substitution. The right framing is a non-invasive microvascular window, not a non-invasive biopsy.

🔬

Clinical decision point

Treat algorithmic pathology as a quantification and consistency aid for the pathologist, not an autonomous diagnostician; require local validation on the lab's own stains and scanners before relying on the output at the report level.

Biomedical mechanism schematic of the oculo-renal microvascular axis. Left panel shows a simplified eye and kidney organ pair. Central dashed inset shows retinal microvasculature (with microaneurysms, pericyte loss, basement-membrane thickening) side-by-side with glomerular microvasculature (with glomerular basement membrane thickening, mesangial expansion, podocyte loss); a dashed bidirectional arrow links them with the label shared hyperglycaemic microvascular injury. Bottom flow shows injury (endothelial dysfunction) routing through a deep-learning model trained on retinal images to a benefit box: non-invasive diabetic kidney disease screening signal, earlier risk stratification. — Why deep learning on retinal images can infer diabetic kidney disease. The retina and the glomerulus share the same microvascular biology, so a model trained on fundus photographs learns a non-invasive microvascular signature that tracks DKD risk. Screening only — not a biopsy substitution.

Evidence anchors: Hermsen 2019 ¹¹; Bouteldja 2022 ¹²; Meng 2025 ¹³.

Module 5 · Dialysis Optimization

Dialysis Optimization — Volume, Anemia, and Modality-Specific Risks

Objective. Map AI to the recurring decisions of HD and PD: volume, anemia, intradialytic events, access surveillance, and PD-specific risks.

Mechanistic framing. Intradialytic instability is the mismatch between ultrafiltration rate and plasma-refill / cardiovascular reserve. Predictive models target this mismatch to pre-empt hypotension and chronic fluid overload — the two failure modes that dominate hospitalization risk on chronic hemodialysis. Volume models are usefully framed as UF-rate decision support, not as autonomous prescribers.

Volume & IDH prediction

Intradialytic-hypotension models built on machine vitals, IDWG, prescription, and ultrafiltration trajectory — best deployed against a quality metric (UF rate exceedances, IDH episodes) rather than direct prescription edits ^15,16.

Anemia / ESA dosing

Reinforcement-learning agents for ESA titration in maintenance HD. Promising on dose stability and target-band time; evidence remains predominantly single-center and retrospective ^15,16.

AV access surveillance

Image- and waveform-based stenosis detection on fistula and graft monitoring — most useful as a triage layer that routes ambiguous studies to the access team ¹⁶.

Peritoneal dialysis

Technique-failure risk, peritonitis prediction, and cardiovascular-event prediction. PD-specific evidence is more recent and remains preliminary ¹⁷.

⚠️

Reality check

Guidelines remain cautious; most published dialysis AI is single-center and retrospective. Pilot dialysis AI as decision support on quality metrics (UF rate, hypotension episodes, dose stability) with prospective audit before it touches prescriptions ^15,16.

Evidence anchors: Burlacu 2020 ¹⁵; Sandys 2022 ¹⁶; Bai 2022 ¹⁷.

Module 6 · Transplantation

Transplantation — Matching, Rejection, and Immunosuppression Dosing

Objective. Survey AI across the transplant continuum: pre-transplant organ matching, post-transplant rejection prediction, and tacrolimus dose individualization.

Pre-transplant, the workhorses are matching and waitlist decision support — models that rank donor–recipient pairs on graft-survival probability or composite utility scores ¹⁸. Post-transplant, graft-rejection prediction draws on serial labs, DSA dynamics, and (where available) donor-derived cell-free DNA, with ML offering modest discrimination gains over single-marker thresholds ¹⁸. Tacrolimus dose individualization is one of the more clinically convincing transplant use cases: nonlinear pharmacokinetics, narrow therapeutic index, and genuine variability in trough achievement make pharmacometric ML models more useful than a flat per-kilogram rule ¹⁹.

The delivery layer wrapping these models is increasingly important. Schwantes 2021 ¹⁹ frames technology-enabled care and remote monitoring — connected blood-pressure cuffs, home labs, asynchronous symptom check-ins — as the platform on which transplant AI actually lands. The implication is operational: a dosing model without a remote-monitoring infrastructure rarely changes care.

💡

Clinical decision point

AI may refine donor–recipient matching and dosing, but allocation and immunosuppression remain physician-and-protocol governed. Use AI to surface candidates and propose doses, not to decide. The accountable physician owns both the rationale and the outcome.

Evidence anchors: Alamgir 2022 ¹⁸; Schwantes 2021 ¹⁹.

Module 7 · Generative AI & LLMs

Generative AI & LLMs in Practice — Patterns That Are Safe at the Bedside

Objective. Give practical, safe patterns for ChatGPT-class tools at the point of care and the desk — where they help, where they fail, and what the non-negotiables are.

The high-yield uses are surprisingly narrow and very real: drafting documentation (a discharge summary skeleton, a clinic letter first pass, a procedure note template), summarizing literature (a structured abstract digest, a comparison table across two trials, a glossary for a patient), generating patient-education material (consistent with this site's guide library — the lay-language paragraph the clinic visit did not have time for), and answering bounded clinical questions where the answer is easy to verify against a primary source. Each of these reuses what LLMs are genuinely good at: fluent text from a structured starting point, with a human verifier downstream.

Retrieval-Augmented Generation (RAG) — the safer architecture for clinical use

A general-purpose LLM hallucinates partly because it has no idea what it does not know. Retrieval-Augmented Generation (RAG) grounds the model in a curated corpus — for nephrology, that might be KDIGO guidelines, ASN core curriculum, your institution's order sets, the specific calculator scripts on this site — so the model retrieves the cited passage, then writes around it ²⁰. The hallucination rate drops because the model is no longer generating plausible-sounding citations from training-time priors; it is paraphrasing a retrieved source you can audit. For any clinical-content deployment, prefer a RAG architecture over a raw LLM.

Five-card horizontal flow: corpus (KDIGO guidelines, ASN core curriculum, institution order sets, calculator scripts) → retriever (embeds the question; pulls top-k passages) → LLM (paraphrases the retrieved passages, does not generate citations from training-time priors) → cited answer (returns the answer plus a clickable source — auditable) → physician verifier (clinician reads the source and signs the note; the model never owns the decision). Bottom summary band reads: hallucination drops because the model paraphrases an audited source. — Retrieval-Augmented Generation in five blocks. The model paraphrases an audited source instead of generating plausible-sounding citations from training-time priors — and the clinician still signs.

Performance and limits

Structured evaluation of LLM decision support shows the same pattern across specialties: competence with meaningful error rates. Niel 2025 ²² is a pediatric-nephrology exemplar — capable performance on bounded clinical questions, but errors that a non-expert reader would not catch. Clinician adoption is also moving ahead of validation: Eppler 2023 ²¹ documented widespread ChatGPT use among trainees and clinicians at a rate that already outstrips the evidence base.

⛔

Non-negotiables

(1) No PHI into consumer tools. A free public chatbot is not HIPAA / DPA-aligned and the prompt is part of the training pipeline by default. (2) Verify every fact and citation before it reaches a chart or a patient — LLMs fabricate confidently. (3) The model never owns the decision. The clinician signs the note, the order, and the chart.

✅

Clinical decision point

Use LLMs to draft and synthesize, then apply physician verification before anything reaches the chart or the patient. Prefer RAG / grounded tools for clinical content; reserve raw LLMs for non-clinical drafting where the verifier downstream is also you.

Four-card horizontal flow with friendly icons: your question (a clinical question or routine task such as a discharge summary or risk estimate) → AI drafts (an AI tool drafts a first pass, grounded in trusted clinical sources where possible) → your doctor reviews (your physician reads the draft, checks the sources, and corrects anything wrong) → you get a signed answer (only after your doctor signs does the answer reach your chart or your hand). Bottom reassurance band: An AI tool never owns the decision. Your clinician does. — Patient-facing companion · shareable
Patient-facing companion figure for sharing in the clinic or on social media — the verification loop in plain language. The AI tool never owns the decision; the clinician does.

Evidence anchors: Miao 2024 ²⁰; Niel 2025 ²²; Eppler 2023 ²¹.

Module 8 · Governance, Bias & Regulation

Governance, Bias & Regulation — Deploying AI Responsibly

Objective. Equip the clinician to deploy AI responsibly — fairness, oversight, law, and the human relationship.

Algorithmic bias made concrete — the eGFR race-coefficient debate

Algorithmic bias is easy to discuss abstractly and hard to feel as a clinician until you watch it reclassify your own patients. The eGFR race coefficient is the worked example. For two decades, MDRD- and CKD-EPI-based equations applied a multiplicative adjustment to creatinine-based eGFR in patients reported as Black, producing a higher estimated GFR for the same creatinine. Removing the coefficient — and migrating to cystatin C-based equations where feasible — shifts who is labeled as having kidney dysfunction, which in turn shifts referral, transplant listing, and drug-dosing decisions. Pinsino 2023 ²⁶ traces this reclassification through an ICU-survivor cohort: a non-trivial share of patients move between CKD categories, and the downstream care follows.

🔬

Honesty box — bias is not abstract

Removing the eGFR race coefficient is not a cosmetic adjustment. It changes who is diagnosed, who is referred, who is dosed, and who is listed. The same logic generalizes to any AI deployed in nephrology: the variables embedded in the model — and the populations it was trained in — are clinical decisions, not engineering details. Inspect them.

Side-by-side comparison: same neutral patient silhouette with the same creatinine of 1.6 mg/dL. Left panel (prior equation with race coefficient): eGFR approximately 56 mL/min/1.73m², CKD category G3a, downstream decisions: referral threshold not yet met, KFRE risk moderate, ACE inhibitor dose maintained, no transplant referral. Right panel (race-free or cystatin-C-based equation): eGFR approximately 48, CKD category G3b, downstream decisions: referral threshold met, higher KFRE risk band, ACE inhibitor and SGLT2 inhibitor intensification reviewed, transplant referral conversation initiated. Pale-purple summary band reads: removing the race coefficient is not cosmetic; it changes who is diagnosed, referred, dosed, and listed. — Bias made concrete. Same creatinine, different equation, different CKD category, different downstream decisions — the same logic that applies to any AI model deployed in nephrology.

Regulation — what "FDA-cleared" does and does not guarantee

The regulatory frame for medical AI is the Software-as-a-Medical-Device (SaMD) pathway. Yu 2023 ²⁵ maps the FDA innovation process and the cleared landscape — useful context for understanding what a clearance actually certifies. The short answer: clearance attests that the manufacturer documented intended use, demonstrated substantial equivalence (510(k)) or supplied premarket data (PMA), and committed to post-market surveillance. It does not certify clinical superiority over existing standards, generalizability to your population, or maintenance of performance after model updates. The clearance is a floor, not a ceiling.

Circular workflow diagram: a small central hub labelled named clinical owner, surrounded by five nodes connected by clockwise arrows. The five nodes are model (intended use, training population, outcome definition, SaMD status), calibration (plot in your own patients before go-live), monitoring (discrimination, calibration, alert burden, subgroup performance on a written cadence), override (a documented path for the clinician to dissent — audited), and accountability (named clinical owner, pre-declared kill-switch criterion, pause authority). An outer ring repeats the message: adopt only tools that have all five. — The five phases a deployable nephrology AI has to live inside. If your institution cannot name the accountable owner and the kill-switch criterion, the tool is not ready — regardless of its publication record.

Ethics, accountability, and the therapeutic relationship

The ethical, social, and legal scaffolding around medical AI — accountability, transparency, consent, liability — is still being constructed in case law and regulation ²³. Two practical implications: (1) every deployed tool needs a named accountable owner within the institution, and (2) patients increasingly deserve disclosure that an AI tool participated in a decision. Morrow 2023 ²⁴ adds a softer but equally important consideration: as workflows automate, preserving the compassion and the therapeutic alliance becomes an active design choice, not an automatic outcome.

✅

Clinical decision point

Adopt only tools with transparent training populations, demonstrated calibration in your patients, a named accountable owner, and a documented audit and override mechanism. If any of these are missing, the answer is not "later" — it is "not until."

Evidence anchors: Sung 2023 ²³; Morrow 2023 ²⁴; Yu 2023 ²⁵; Pinsino 2023 ²⁶.

Practical Tool

The 7-Point AI Appraisal Checklist

A reusable structure for any nephrology AI claim — from a peer-reviewed paper to a vendor demo. Each item is small; together they are the difference between adopting a tool and being adopted by it. Run the seven questions in order. A "no" on any one is not necessarily a veto, but it is a question that must be answered before deployment, not after.

💡

How to use the checklist

Print it on the back of a card or paste it into your appraisal template. Apply it to one published model per month — the discipline matters more than the throughput. Two of the seven items (calibration; subgroup fairness) are the ones most often glossed over in conference talks; spend extra time there.

#	Question	Why it matters
1	What exactly does it predict?	Outcome definition and label quality determine everything downstream. "AKI" defined as a coding flag, a KDIGO creatinine criterion, and a clinical adjudication are three different models in a trench coat.
2	In whom was it trained?	Population mismatch (age, sex, race/ethnicity, comorbidity, region, care setting) breaks transportability. A model trained on a 94%-male VA cohort behaves differently in a community OB-medical service.
3	Externally validated?	Internal-only performance routinely overstates real-world accuracy. Require validation in a population that did not contribute to training — and report the drop honestly.
4	Is it calibrated?	Good discrimination with poor calibration misleads at the decision threshold. A model that ranks well but predicts 50% events that occur 20% of the time will systematically over-treat.
5	Does it arrive in time to act?	Lead time and alert specificity decide whether the prediction can change care. An alert that fires when the action window has closed is documentation, not decision support.
6	Is it fair?	Check subgroup performance and embedded variables (e.g., race coefficients). A model whose accuracy is preserved by reclassifying patients who should not be reclassified is not fair — it is convenient.
7	Who is accountable?	Named owner, override path, monitoring cadence, and regulatory status. If no one in the institution can name the responsible clinical lead and the kill-switch criterion, the deployment is not ready.

Printable clinician card titled The 7-Point AI Appraisal Checklist. Seven numbered rows: 1) What exactly does it predict? Outcome definition drives everything downstream. 2) In whom was it trained? Population mismatch breaks transportability. 3) Externally validated? Internal-only performance overstates real-world accuracy. 4) Is it calibrated? Good discrimination with poor calibration misleads at threshold. 5) Does it arrive in time to act? Lead time and alert specificity decide whether it changes care. 6) Is it fair? Inspect subgroup performance and embedded variables. 7) Who is accountable? Named owner, override path, monitoring cadence, regulatory status. Below the table, a four-tier maturity strip: tier 1 (green) externally validated, calibrated, prospective; tier 2 (teal) externally validated, retrospective; tier 3 (amber) single-center, internal validation only; tier 4 (purple) preprint or vendor claim without peer review. — A printable card to apply to any AI claim. Map the model to a tier; the tier should track how aggressively it is allowed to touch a workflow.

Tier 1 Strongest evidence (externally validated, calibrated, prospective). Tier 2 Externally validated but retrospective. Tier 3 Single-center, internal validation only. Tier 4 Preprint or vendor claim without peer review. Map each model to a tier when you fill the checklist; the tier should track how aggressively the model is allowed to touch a workflow.

Cross-Cutting

Kidney–Cardiovascular–Metabolic Integration — the Sidebar Every Module Carries

A recurring thread runs across all eight modules: every AI use case ties back to mechanism and to the interconnected physiology of the kidney, heart, and metabolism. CKD progression is also cardiovascular progression — and an SGLT2 inhibitor escalated on the back of a CKD-risk model is also a heart-failure and cardiovascular-event intervention. AKI prediction is also hemodynamic prediction, and the patient flagged by the model is also the patient at risk for a non-renal cardiovascular event. Anemia management in dialysis is also a cardiovascular conversation. Even the LLM use case — synthesizing patient-education material — is most useful when it explains the same physiology to the patient that the clinician is acting on.

The corollary is a deployment principle: a prediction is only useful when it routes to a physiologically rational, guideline-aligned action (KDIGO, ADA, ACC, ERA). The editorial throughline, again: AI should sharpen pathophysiologic reasoning and therapeutic precision — not substitute for them.

Minimal monoline medical sigil illustration with three stylised organs forming an equilateral triangle: a pancreas glyph at top centre, a heart at lower left, and a pair of kidneys at lower right. Each pair of organs is connected by soft dotted curved arrows forming a continuous bidirectional triangular loop, conveying continuous crosstalk rather than one-way causation. Clean white background, soft teal-blue palette, no text labels. — The kidney–cardiovascular–metabolic triangle every AI use case in nephrology touches. A prediction is only useful when it routes to a physiologically rational, guideline-aligned action.

Guideline alignment and positioning

As of this draft, KDIGO, ASN, and ERA have not published a dedicated AI practice guideline. This guide is therefore positioned as a perspective that organizes the current evidence and offers pragmatic guardrails until formal guidance exists — explicitly aligned with existing KDIGO (CKD/AKI), ADA (diabetic kidney disease), and ACC (cardiorenal) recommendations wherever an AI use case touches a managed decision. When society guidance arrives, this perspective should yield to it; until then, the seven-point checklist and the per-module decision points are the working scaffold.

Clinician FAQ

Questions Clinicians Ask Me About AI in Nephrology

Should I be using ChatGPT for clinical questions?

For documentation drafts, literature summaries, and patient-education first passes — yes, with verification. For point-of-care clinical decisions, prefer a RAG-grounded tool fed by guidelines and your institution's order sets, and never enter PHI into a consumer endpoint. The rule of thumb: if you would not paste it into a public Google Doc, do not paste it into a consumer LLM.

My EHR vendor is pitching an AKI predictor. What do I ask them?

Run the seven-point checklist on it. Pay particular attention to the alert burden in the vendor's reference site (how many alerts per true AKI?), the calibration plot in your case mix (not theirs), and the action bundle the alert routes to. If the bundle is "notify the clinician," it is not a deployment — it is a notification. Ask for a 90-day audited pilot on a quality metric before any change to order sets.

Is ML really better than KFRE for my outpatient CKD population?

In a general nephrology clinic, KFRE remains the right tool. In a cardiometabolic-heavy population — type 2 diabetes, established cardiovascular disease — there is now externally validated evidence (Klinrisk in CANVAS and CREDENCE) that ML adds discrimination ¹⁴. The decision the model should change is therapeutic escalation timing and access planning, not diagnosis.

Has KDIGO published an AI guideline yet?

As of this writing, no — neither KDIGO nor ASN nor ERA has issued a dedicated AI practice guideline. Existing KDIGO (CKD/AKI), ADA (DKD), and ACC (cardiorenal) guidance still governs the underlying decisions; AI tools should be appraised on whether they help reach those guideline-aligned actions earlier, more reliably, or in better-stratified patients. This perspective should yield to society guidance once it arrives.

What about the eGFR race coefficient — is this settled?

The direction is settled: the race coefficient is being removed and cystatin C-based equations are preferred where available. The clinical implication is not settled — the reclassification it produces shifts referral, listing, and dosing decisions for a meaningful share of patients ²⁶. Audit your own lab and EHR to confirm which equation is being used and how reclassified patients are being managed. This is bias-mitigation in operation, not in principle.

How do I keep an AI tool from quietly drifting in production?

Three habits: (1) a written monitoring cadence (calibration plot, alert burden, subgroup performance) on a defined schedule; (2) a named clinical owner who reviews the monitor and has authority to pause the tool; (3) a kill-switch criterion declared in advance, not negotiated after a near-miss. If the institution cannot supply these, the tool is not ready for deployment regardless of its publication record.

Artificial Intelligence in Nephrology Practice

What This Guide Is — and What It Is Not

The editorial throughline

Foundations & Taxonomy — a Mental Model Without the Engineering Jargon

Metrics literacy — why a high AUROC can still be useless at the bedside

Clinical decision point

CKD Risk Stratification — from Population Risk to Action-Linked Prediction

From KFRE to ML — what the added complexity buys

Clinical decision point

Acute Kidney Injury Prediction — the Modifiable Window

The landmark — and its honest caveats

Audit alert burden locally before adoption

Digital Pathology & Imaging — Reproducibility, Not Autonomy

Clinical decision point

Dialysis Optimization — Volume, Anemia, and Modality-Specific Risks

Volume & IDH prediction

Anemia / ESA dosing

AV access surveillance

Peritoneal dialysis

Reality check

Transplantation — Matching, Rejection, and Immunosuppression Dosing

Clinical decision point

Generative AI & LLMs in Practice — Patterns That Are Safe at the Bedside

Retrieval-Augmented Generation (RAG) — the safer architecture for clinical use

Performance and limits

Non-negotiables

Clinical decision point

Governance, Bias & Regulation — Deploying AI Responsibly

Algorithmic bias made concrete — the eGFR race-coefficient debate

Honesty box — bias is not abstract

Regulation — what "FDA-cleared" does and does not guarantee

Ethics, accountability, and the therapeutic relationship

Clinical decision point

The 7-Point AI Appraisal Checklist

How to use the checklist

Kidney–Cardiovascular–Metabolic Integration — the Sidebar Every Module Carries

Guideline alignment and positioning

Questions Clinicians Ask Me About AI in Nephrology

W Rivero, MD, FPCP, DPSN

Artificial Intelligence in Nephrology Practice

What This Guide Is — and What It Is Not

The editorial throughline

Foundations & Taxonomy — a Mental Model Without the Engineering Jargon

Metrics literacy — why a high AUROC can still be useless at the bedside

Clinical decision point

CKD Risk Stratification — from Population Risk to Action-Linked Prediction

From KFRE to ML — what the added complexity buys

Clinical decision point

Acute Kidney Injury Prediction — the Modifiable Window

The landmark — and its honest caveats

Audit alert burden locally before adoption

Digital Pathology & Imaging — Reproducibility, Not Autonomy

Clinical decision point

Dialysis Optimization — Volume, Anemia, and Modality-Specific Risks

Volume & IDH prediction

Anemia / ESA dosing

AV access surveillance

Peritoneal dialysis

Reality check

Transplantation — Matching, Rejection, and Immunosuppression Dosing

Clinical decision point

Generative AI & LLMs in Practice — Patterns That Are Safe at the Bedside

Retrieval-Augmented Generation (RAG) — the safer architecture for clinical use

Performance and limits

Non-negotiables

Clinical decision point

Governance, Bias & Regulation — Deploying AI Responsibly

Algorithmic bias made concrete — the eGFR race-coefficient debate

Honesty box — bias is not abstract

Regulation — what "FDA-cleared" does and does not guarantee

Ethics, accountability, and the therapeutic relationship

Clinical decision point

The 7-Point AI Appraisal Checklist

How to use the checklist

Kidney–Cardiovascular–Metabolic Integration — the Sidebar Every Module Carries

Guideline alignment and positioning

Questions Clinicians Ask Me About AI in Nephrology

W Rivero, MD, FPCP, DPSN

Related guides