Deep Learning for Stroke Mortality Prediction in eICU:
A Dual-Tower Transformer Framework

Zhengrong Jia* ; Kwong-Cheong Wong*

Presenter: Zhengrong Jia

From: Asia AI Education and Future Technology Association, Hong Kong SAR, China

Research Area: Deep Learning · Clinical EHR · Stroke Mortality Prediction

Email: zhengrong.jia.academic@gmail.com

Paper CI5896 · CCAI 2026 · May 22–24, Nanjing

What We Will Cover

  1. Motivation — why ICU mortality prediction is hard
  2. Dataset & two core challenges
  3. DT-Transformer: dual-tower architecture
  4. Experiments — results & ablation
  5. Theoretical implications (§V.B): why it works
  6. Interpretability — what the model attends to
  7. Conclusion & future directions

Paper CI5896 · DT-Transformer · eICU · Stroke Mortality Prediction

CI5896 · Zhengrong Jia · CCAI 2026

Why Standard Deep Learning Fails on Mixed Clinical Data

15–20% in-hospital mortality among stroke ICU patients · Feigin et al., 2021

Why Standard DL Still Fails Here

  • Tabular EHR mixes categorical + continuous features
    → architectures without the right inductive bias underfit
  • Clinical data: outliers, missing values, NaNs
    → monolithic Transformers crash or converge poorly
    (Std Transformer AUPRC 0.5279 ± 0.0195)
  • eICU: 97 features, 14% mortality, multi-center noise

The challenge is not data volume — it is architectural mismatch between standard DL and the structure of clinical tabular data.

CI5896 · Zhengrong Jia · CCAI 2026

The eICU Database

Multi-center · freely available · Pollard et al., 2018

97 features total input dimension
3 + 94 cat + num age/gender/ethnicity + vitals & labs
14% mortality rate heavily class-imbalanced
AUPRC not AUROC right metric for imbalance

14% mortality → AUPRC is the right metric. Our 0.6171 is far above random-classifier level.

CI5896 · Zhengrong Jia · CCAI 2026

Two Problems We Had to Solve

Heterogeneous feature types demand separate encoders. Raw EHR data demands an online safety layer.

CI5896 · Zhengrong Jia · CCAI 2026

Our Model: The DT-Transformer

C1: Dual-Tower Encoder · C2: Adaptive Safeguard · C3: Optuna HPO

Dual-Tower encoding + Adaptive Safeguard + Optuna HPO — three contributions, one differentiable pipeline.

CI5896 · Zhengrong Jia · CCAI 2026

Left Tower: Categorical Embeddings

High-dimensional sparse indices → dense 22-d vector h_cat. Each feature gets its own learned embedding space.

CI5896 · Zhengrong Jia · CCAI 2026

Right Tower: Numerical Transformer

Self-Attention explicitly computes pairwise interactions across all 94 physiological features — something MLPs only approximate.

CI5896 · Zhengrong Jia · CCAI 2026

Feature Fusion and Prediction

The outputs of both towers are concatenated into h_joint ∈ ℝ⁸⁶ — bridging static demographic features and dynamic vital signs.

CI5896 · Zhengrong Jia · CCAI 2026

DT-Transformer Is Now Competitive With XGBoost

0.6171 AUPRC ±0.0058 · 5 seeds
0.8848 AUROC ±0.0034 · 5 seeds
+14.4% vs DT-MLP XGBoost: 0.6467

XGBoost retains a marginal AUPRC lead (0.6467) — our deep learning model closes the gap with full differentiability.

CI5896 · Zhengrong Jia · CCAI 2026

Self-Attention Is the Key: +14.41% Relative AUPRC Gain

Replacing the Transformer encoder with fully connected layers drops AUPRC by 14.41% — the gain is driven by self-attention modeling global feature interactions.

CI5896 · Zhengrong Jia · CCAI 2026

Why the Dual Tower Wins: Three Design Principles (§V.B)

Beyond empirical gains: three design principles for applying deep learning to heterogeneous clinical tabular data.

CI5896 · Zhengrong Jia · CCAI 2026

The Model Focuses on What Clinicians Care About

Top 5 features by attention weight

Respiration 0.081
Total Cholesterol 0.078
Glucose 0.076
Bedside Glucose 0.036
Heart Rate 0.029

The model autonomously prioritizes clinically significant risk factors, improving interpretability compared to black-box baselines.

CI5896 · Zhengrong Jia · CCAI 2026

What We Showed and Where We Go Next

What we showed
Architecture Matters Decoupled towers + self-attention: AUPRC 0.5394 → 0.6171 with 5-seed consistency
Engineering for Robustness Adaptive Runtime Safeguard: zero-crash inference on multi-center EHR data
Fair Comparison Optuna HPO: attention-based deep learning competes with XGBoost when properly tuned
What comes next
Validate on MIMIC-IV Test generalization across hospital systems
Fuse clinical notes with LLMs Go beyond structured EHR features
Shadow-mode clinical trial Run alongside real ICU decision-making

A fully differentiable alternative to gradient boosting — extensible to multimodal integration with unstructured clinical notes.

CI5896 · Zhengrong Jia · CCAI 2026

Thank You

Questions welcome

Zhengrong Jia · zhengrong.jia.academic@gmail.com

Paper CI5896 · CCAI 2026 · May 22–24, Nanjing