Medicare Inpatient Outcome Prediction Models (Prospective)

Model ID: Medicare LDS 2023 Prospective Inpatient Model Bundle Model Types: XGBoost Regressor (Gamma) & Calibrated Logistic Regression (binary and multinomial) Dataset: 2023 CMS Limited Data Set (LDS) Target Level: Inpatient Encounter

What the Model Predicts

This bundle contains three distinct models that predict key outcomes for a given inpatient hospital stay:

Length of Stay (Regression): Predicts the total number of days for the inpatient stay.
Readmission Probability (Binary Classification): Predicts the probability (from 0.0 to 1.0) that the patient will be readmitted to a hospital within 30 days of discharge.
Discharge Location Probability (Multiclass Classification): Predicts the probability for each possible discharge location (e.g., Home, Skilled Nursing Facility, Hospice).

Intended Use

This model bundle is designed to support a variety of clinical and operational workflows:

Benchmarking: Compare observed outcomes against predicted risks for a given patient population.
Healthcare Research: Analyze drivers of inpatient outcomes.
Actuarial Analysis: Inform risk stratification and cost estimation models.

Note on Prediction Type: The models are trained for prospective prediction — they use demographic context and lagged clinical history available prior to or at admission (prediction-year demographics plus prior-year/lagged conditions) to predict outcomes for that inpatient stay.

Model Performance

These metrics reflect performance on a 20% held-out test split from the 2023 CMS LDS data, computed by the prospective trainer.

Model 1: Length of Stay (XGBoost Regressor)

R²: 0.2005
MAE (days): 2.9222
MSE: 27.4028
MAE percent: 57.28%
Pred/True sum ratio: 0.9888

Model 2: Readmission Probability (Calibrated Logistic Regression)

AUC ROC: 0.6368
AUC PR: 0.2426
Log loss: 0.4267
Brier score: 0.1307
Avg. true rate: 0.1610 | Avg. predicted prob.: 0.1616

Model 3: Discharge Location (Calibrated Multinomial Logistic Regression)

Accuracy: 0.5000
Log loss: 1.3681
Brier score (macro avg): 0.0802

Files Included

2023_prospective_inpatient_models_bundle.pkl — Serialized bundle with trained models, per-target feature lists, discharge label encoder, and LOS calibration factor:
- los_model (XGBoost, Gamma objective + early stopping)
- readmission_model (Logistic Regression + isotonic calibration)
- discharge_model (Multinomial Logistic Regression + sigmoid calibration)
- feature_columns_los, feature_columns_readmission, feature_columns_discharge
- le_discharge, los_calibration_factor, model_run_id
inpatient_models_prospective_train.py — Training script that builds features from BENCHMARKS_INPATIENT_INPUT_PROSPECTIVE, trains the three models with optional greedy feature selection, logs metrics/feature frequency/importances to Snowflake, and uploads the bundle to a Snowflake stage.
inpatient_models_prospective_predict.py — Prediction script that downloads the bundle from stage, prepares features, generates predictions (including per-class discharge probabilities), writes predictions to Snowflake, and logs evaluation metrics.
prosp_inpatient_model_eval_metrics_train.csv — Long-format training metrics exported from Snowflake for LOS, Readmission, and Discharge.
prosp_inpatient_model_feature_frequency.csv — Feature prevalence diagnostics computed on the training matrix.

Understanding Model Artifacts

This repository includes CSVs and Snowflake tables that provide visibility into the model inputs and performance. The trainer also writes to these Snowflake tables: PROSP_FEATURE_FREQUENCY, PROSP_MODEL_FEATURE_IMPORTANCE, and PROSP_MODEL_EVAL_METRICS_TRAIN (all under the configured database/schema).

Feature Fill Rates (`prosp_inpatient_model_feature_frequency.csv`)

This file is a diagnostic tool for understanding the input data used to train the models. It helps you check for data drift or data quality issues.

Column	Description
`FEATURE_NAME`	The name of the input feature (e.g., `age_at_admit`, `cond_hypertension`).
`POSITIVE_COUNT`	The number of records in the training set where this feature was present (non-zero).
`TOTAL_ROWS`	The total number of records in the training set.
`POSITIVE_RATE_PERCENT`	The prevalence or "fill rate" of the feature (`POSITIVE_COUNT` / `TOTAL_ROWS`).

How to Use: Compare the POSITIVE_RATE_PERCENT from this file with the rates from your own prediction input data. Significant discrepancies can indicate data drift or pipeline issues that may degrade model performance.

Feature Importances (Snowflake table `PROSP_MODEL_FEATURE_IMPORTANCE`)

The trainer logs feature importances for LOS (XGBoost importance types) and for Readmission/Discharge (absolute logistic coefficients) to Snowflake for explainability. Query this table and filter by TARGET_TYPE and TARGET_NAME to review drivers.

Quick Start: End-to-End Workflow

This section provides high-level instructions for running a model with the Tuva Project. The workflow involves preparing benchmark data using dbt, running a Python prediction script, and optionally ingesting the results back into dbt for analysis.

1. Configure Your dbt Project

You need to enable the correct variables in your dbt_project.yml file to control the workflow.

A. Enable Benchmark Marts

These two variables control which parts of the Tuva Project are active. They are false by default.

# in dbt_project.yml
vars:
  benchmarks_train: true
  benchmarks_already_created: true

benchmarks_train: Set to true to build the datasets that the ML models will use for making predictions.
benchmarks_already_created: Set to true to ingest model predictions back into the project as a new dbt source.

B. (Optional) Set Prediction Source Locations

If you plan to bring predictions back into dbt for analysis, you must define where dbt can find the prediction data.

# in dbt_project.yml
vars:
  predictions_inpatient_prospective: "{{ source('benchmark_output', 'inpatient_predictions_prospective') }}"

C. Configure `sources.yml`

Ensure your sources.yml file includes a definition for the source you referenced above (e.g., benchmark_output) that points to the database and schema where your model's prediction outputs are stored.

2. The 3-Step Run Process

This workflow can be managed by any orchestration tool (e.g., Airflow, Prefect, Fabric Notebooks) or run manually from the command line.

Step 1: Generate the Training & Benchmarking Data

Run the Tuva Project with benchmarks_train enabled. This creates the input data required by the ML model.

dbt build --vars '{benchmarks_train: true}'

To run only the benchmark mart:

dbt build --select tag:benchmarks_train --vars '{benchmarks_train: true}'

Step 2: Run the Prediction Python Code

Run inpatient_models_prospective_predict.py to generate predictions. It reads BENCHMARKS_INPATIENT_INPUT_PROSPECTIVE for the configured MODEL_YEAR, downloads the bundle from the Snowflake stage, and writes predictions to PROSP_INPATIENT_PREDICTIONS.

Step 3: (Optional) Bring Predictions back into Tuva Project

To bring the predictions back into the Tuva Project for analysis, run dbt again with benchmarks_already_created enabled. This populates the analytics marts.

dbt build --vars '{benchmarks_already_created: true, benchmarks_train: false}'

To run only the analysis models:

dbt build --select tag:benchmarks_analysis --vars '{benchmarks_already_created: true, benchmarks_train: false}'

Feature Engineering Summary

Categorical one-hot: prediction_year_sex, prediction_year_race, prediction_year_state, ms_drg_code, ccsr_cat.
Numeric: prediction_year_age_at_admit, cold_start, lag_missing (and in prediction script, lag_age_at_admit).
Lagged clinical flags: LAG_COND_*, LAG_CMS_*, LAG_HCC* columns from the prospective input table.
Training excludes lagged demographics from one-hot encoding to reduce redundancy; prediction aligns features to the trained lists.

Training Details

Input table: BENCHMARKS_INPATIENT_INPUT_PROSPECTIVE (Snowflake).
LOS: XGBoost reg:gamma with early stopping; post-hoc calibration factor applied so predicted sums match validation sums.
Readmission: Logistic Regression (class_weight balanced) with isotonic calibration.
Discharge: Multinomial Logistic Regression with sigmoid calibration.
Optional greedy feature selection per target with tolerance window; selection details recorded in the bundle.
Artifacts uploaded to stage: @MEDICARE_LDS_FIVE_PERCENT.BENCHMARKS.MODEL_STAGE as {MODEL_YEAR}_prospective_inpatient_models_bundle.pkl.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support