This analysis explores how the recently proposed restructuring of the National Institutes of Health (NIH) fits into decades of historical funding patterns. In March 2025, HHS unveiled the “Make America Healthy Again” reorganization, which collapses the 27 existing NIH institutes and centers into eight new institutes and realigns several functions.
Details of the budget numbers were released on May 30, 2025, as part of the HHS FY 2026 Budget in Brief.
So with the help of ChatGPT, I asked: How do those new groupings compare to NIH’s budget trajectory back to FY 2000?
To be clear, I am not an expert in NIH funding or the budget process. I am a scientist interested in how the NIH budget has changed over time, and how the new reorganization plan fits into that historical context. There were several ambiguities in the budget document, and I used ChatGPT to help me parse the data. Please do let me know if you find any major errors or have suggestions for improvement.
# Read budgets and mapping
budgets <- read.csv("budgets.csv", stringsAsFactors = FALSE)
mapping <- read.csv("mapping.csv", stringsAsFactors = FALSE)
# Separate historical vs. 2026 totals
hist_budgets <- budgets %>% filter(year < 2026)
new_totals <- budgets %>% filter(year == 2026) %>%
rename(new_ic = ic, total_2026 = budget) %>%
select(new_ic, total_2026)
# Tag each old IC with its new institute
hist_joined <- hist_budgets %>%
left_join(mapping, by = c("ic" = "old_ic")) %>%
filter(!is.na(new_ic))
# Slice 2024 to compute group sums
slice_2024 <- hist_joined %>%
filter(year == 2024) %>%
select(ic, budget_2024 = budget, new_ic)
group_sums_2024 <- slice_2024 %>%
group_by(new_ic) %>%
summarize(group_sum_2024 = sum(budget_2024), .groups = "drop")
if (!"Eliminated" %in% new_totals$new_ic) {
new_totals <- bind_rows(
new_totals,
tibble(new_ic = "Eliminated", total_2026 = group_sums_2024$group_sum_2024[group_sums_2024$new_ic == "Eliminated"])
)
}
Data was collected from the NIH Almanac and the HHS FY 2026 Budget in Brief by ChatGPT in a CSV file. I have manually spot-checked about a dozen or so of the ~300 data points, and they seem to be correct.
I also asked it to create a mapping table for old-to-new IC
designations. ChatGPT was surprisingly bad at this, though
understandably so. For example, it had assigned NIDCR (National
Institute for Dental and Craniofacial Research) to the new “Body
Systems” institute, which is not what is outlined in the budget
document. I had to manually correct this and a few other misassignments.
The mapping table is included in the repo as
mapping.csv
.
The historical budgets from 2000 to 2024 were joined with a mapping of old ICs to new institutes proposed for 2026. The 2024 budget was sliced to compute group sums for each new institute.
Since the 2026 budget numbers are only reported for the new IC designations, we need to estimate how much each old IC would contribute to the new institute’s budget. The simplest assumption is that the 2026 budget for each new institute is proportionally allocated based on the 2024 budget of its old IC members. That is, the old ICs get the same relative share of the new institute’s budget in 2026 as they did in 2024.
Practically, we do this by taking the 2024 budget for each old IC, dividing it by the total budget for that new institute in 2024, and multiplying it by the total budget for that new institute in 2026.
To adjust historical budgets for inflation, we will use BRDPI (Biomedical Research and Development Price Index) provided by the NIH Office of Budget. This index allows us to convert historical budget numbers into real terms relative to a base year (2024 in this case). Indices were extracted from this PDF.
BRDPI adjusts not only for overall inflation, but specifically for things relevant to the NIH budget. It tracks “prices paid for the labor, supplies, equipment, and other inputs required to perform the biomedical research the NIH supports in its intramural laboratories and through its awards to extramural organizations.” Source: Bureau of Economic Analysis (BEA)
# 1) BRDPI lookup table
## Using numbers from the first column of the BRDPI table provided by NIH Office of Budget:
## https://officeofbudget.od.nih.gov/pdfs/FY24/GDP/BRDPI%20Price%20Index%20Annual%20and%20Cumulative%20Values_For_1950_Through_2029.pdf
brdpi <- tibble::tribble(
~year, ~brdpi,
2000L, 977.7,
2010L, 1403.7,
2020L, 1733.4,
2024L, 1994.1,
2026L, 2101.5
)
# 2) Base index (2024)
base_idx <- brdpi %>% filter(year==2026) %>% pull(brdpi)
# 3) Merge & compute real budgets
plot_data_real <- plot_data %>%
# turn the factor‐year ("x") into an integer so we can join
mutate(year = as.integer(as.character(x))) %>%
left_join(brdpi, by="year") %>%
mutate(
weight_real = weight * (base_idx / brdpi)
)
These plot show the budget amounts in nominal or adjusted dollars in separate tabs. The x-axis represents fiscal years, while the y-axis shows the budget amounts in millions of USD. The colors represent the new institutes proposed for 2026, and the flows show how budgets from old ICs are redistributed into these new institutes.
Note that the x-axis is not linear, as it represents discrete fiscal years.
palette <- c(
"NCI" = "#377EB8",
"Body Systems" = "#E41A1C",
"Neuroscience & Brain" = "#4DAF4A",
"NIAID" = "#984EA3",
"GMS" = "#FF7F00",
"Child & Women’s Health" = "#A65628",
"NIA" = "#F781BF",
"Behavioral Health" = "#999999",
"Office of the Director" = "#8DD3C7",
"Eliminated" = "grey90",
"Moved out of NIH" = "#666666"
)
plot_data_readable <- plot_data_real %>%
rename(
IC = alluvium,
Year = x,
Budget = weight_real,
IC_Year = stratum,
New_Institute = group
)
myplot <- ggplot(plot_data_readable,
aes(x = Year, stratum = IC_Year, alluvium = IC, y = Budget,
fill = New_Institute, label=IC_Year)) +
geom_flow(alpha = 0.5) +
geom_stratum(width = 0.5, color = "grey30", alpha=0.8) +
geom_text(
data = plot_data_readable %>% filter(Year == "2026"),
# aes(label = IC),
stat = "stratum",
size = 4,
vjust = 0.5
) +
scale_fill_manual(values = palette, na.value = "white") +
labs(
title = "NIH Budget Alluvial: 2000→2010→2020→2024→2026",
subtitle = "Flows from 27 old ICs into 8 new institutes (FY 2026 proposed)",
x = "Fiscal Year",
y = "Budget (Millions USD, inflation adjusted)",
fill = "2026 Institute"
) +
# increase x- and y-axis label font sizes
theme_minimal() +
theme(axis.title.x = element_text(size = 18),
axis.title.y = element_text(size = 18),
axis.text.x = element_text(size = 16),
axis.text.y = element_text(size = 16)) +
theme(legend.position = "none")
myplot
ggsave("nih_budget_alluvial_adj.png", myplot, width = 10, height = 6, dpi = 300)
Note that the x-axis is not linear, as it represents discrete fiscal years.
palette <- c(
"NCI" = "#377EB8",
"Body Systems" = "#E41A1C",
"Neuroscience & Brain" = "#4DAF4A",
"NIAID" = "#984EA3",
"GMS" = "#FF7F00",
"Child & Women’s Health" = "#A65628",
"NIA" = "#F781BF",
"Behavioral Health" = "#999999",
"Office of the Director" = "#8DD3C7",
"Eliminated" = "grey90",
"Moved out of NIH" = "#666666"
)
plot_data_readable <- plot_data_real %>%
rename(
IC = alluvium,
Year = x,
Budget = weight,
IC_Year = stratum,
New_Institute = group
)
myplot <- ggplot(plot_data_readable,
aes(x = Year, stratum = IC_Year, alluvium = IC, y = Budget,
fill = New_Institute, label=IC_Year)) +
geom_flow(alpha = 0.5) +
geom_stratum(width = 0.5, color = "grey30", alpha=0.8) +
geom_text(
data = plot_data_readable %>% filter(Year == "2026"),
# aes(label = IC),
stat = "stratum",
size = 4,
vjust = 0.5
) +
scale_fill_manual(values = palette, na.value = "white") +
labs(
title = "NIH Budget Alluvial: 2000→2010→2020→2024→2026",
subtitle = "Flows from 27 old ICs into 8 new institutes (FY 2026 proposed)",
x = "Fiscal Year",
y = "Budget (Millions USD, nominal)",
fill = "2026 Institute"
) +
theme_minimal() +
theme(axis.title.x = element_text(size = 18),
axis.title.y = element_text(size = 18),
axis.text.x = element_text(size = 16),
axis.text.y = element_text(size = 16)) +
theme(legend.position = "none")
myplot
ggsave("nih_budget_alluvial_raw.png", myplot, width = 10, height = 6, dpi = 300)
These tables shows the values in the plot above, with nominal and
adjusted values in separate tabs. You can scroll, sort, and search using
the table controls. The % Change
column shows the
percentage change in budget from FY 2024 to FY 2026 for each
institute.
library(tidyr)
library(DT)
library(RColorBrewer)
# 1) Pivot to wide, round, and compute pct_change
wide_data <- plot_data_real %>%
select(alluvium, group, x, weight_real) %>%
pivot_wider(names_from = x, values_from = weight_real) %>%
arrange(group, alluvium) %>%
mutate(across(where(is.numeric), ~ round(.))) %>%
rename(
IC = alluvium,
`2026 Institute` = group,
FY2000 = `2000`,
FY2010 = `2010`,
FY2020 = `2020`,
FY2024 = `2024`,
FY2026 = `2026`
) %>%
mutate(
`% Change` = round((FY2026 - FY2024) / FY2024 * 100, 1)
)
# 2) Prepare palettes with alpha = 0.6
inst_levels <- sort(unique(wide_data$`2026 Institute`))
row_colors_alpha <- sapply(palette[inst_levels], function(hex) {
rgb <- grDevices::col2rgb(hex)
sprintf("rgba(%d,%d,%d,0.6)", rgb[1], rgb[2], rgb[3])
})
div_hex <- brewer.pal(5, "RdYlGn")[c(1,5)]
div_colors_alpha <- sapply(div_hex, function(hex) {
rgb <- grDevices::col2rgb(hex)
sprintf("rgba(%d,%d,%d,0.6)", rgb[1], rgb[2], rgb[3])
})
# 3) Columns to shade by institute (everything except pct_change)
cols_institute <- setdiff(names(wide_data), "`% Change`")
# 4) Render the table
datatable(
wide_data,
rownames = FALSE,
extensions = "Buttons",
options = list(
dom = 'Bfrtip',
paging = FALSE,
scrollX = TRUE,
scrollY = "400px",
scrollCollapse = TRUE,
buttons = c('copy', 'csv', 'excel')
)
) %>%
# A) Use the 2026 Institute *column* to color all the non–pct_change columns
formatStyle(
columns = cols_institute,
valueColumns = "2026 Institute", # <-- lookups come from here
backgroundColor = styleEqual(inst_levels, row_colors_alpha)
) %>%
# B) Then style pct_change with diverging palette
formatStyle(
"% Change",
backgroundColor = styleInterval(0, div_colors_alpha),
color = styleInterval(0, c("black","black"))
)
library(tidyr)
library(DT)
library(RColorBrewer)
# 1) Pivot to wide, round, and compute pct_change
wide_data <- plot_data_real %>%
select(alluvium, group, x, weight) %>%
pivot_wider(names_from = x, values_from = weight) %>%
arrange(group, alluvium) %>%
mutate(across(where(is.numeric), ~ round(.))) %>%
rename(
IC = alluvium,
`2026 Institute` = group,
FY2000 = `2000`,
FY2010 = `2010`,
FY2020 = `2020`,
FY2024 = `2024`,
FY2026 = `2026`
) %>%
mutate(
`% Change` = round((FY2026 - FY2024) / FY2024 * 100, 1)
)
# 2) Prepare palettes with alpha = 0.6
inst_levels <- sort(unique(wide_data$`2026 Institute`))
row_colors_alpha <- sapply(palette[inst_levels], function(hex) {
rgb <- grDevices::col2rgb(hex)
sprintf("rgba(%d,%d,%d,0.6)", rgb[1], rgb[2], rgb[3])
})
div_hex <- brewer.pal(5, "RdYlGn")[c(1,5)]
div_colors_alpha <- sapply(div_hex, function(hex) {
rgb <- grDevices::col2rgb(hex)
sprintf("rgba(%d,%d,%d,0.6)", rgb[1], rgb[2], rgb[3])
})
# 3) Columns to shade by institute (everything except pct_change)
cols_institute <- setdiff(names(wide_data), "`% Change`")
# 4) Render the table
datatable(
wide_data,
rownames = FALSE,
extensions = "Buttons",
options = list(
dom = 'Bfrtip',
paging = FALSE,
scrollX = TRUE,
scrollY = "400px",
scrollCollapse = TRUE,
buttons = c('copy', 'csv', 'excel')
)
) %>%
# A) Use the 2026 Institute *column* to color all the non–pct_change columns
formatStyle(
columns = cols_institute,
valueColumns = "2026 Institute", # <-- lookups come from here
backgroundColor = styleEqual(inst_levels, row_colors_alpha)
) %>%
# B) Then style pct_change with diverging palette
formatStyle(
"% Change",
backgroundColor = styleInterval(0, div_colors_alpha),
color = styleInterval(0, c("black","black"))
)
ggsave("nih_budget_alluvial.png", myplot, width = 10, height = 6, dpi = 300)
This is an attempt to make the plot interactive. Meh, it’s kinda crappy, but you can use the hover text to get identities and budgets for the smaller slices. Click and drag to zoom, triple click to zoom back out, and/or use the buttons on the top right corner.
library(plotly)
# Convert ggplot to plotly
ggplotly(myplot, tooltip=c("x","y","fill","stratum"))
sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: aarch64-apple-darwin20
## Running under: macOS 15.5
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] plotly_4.10.4 RColorBrewer_1.1-3 metathis_1.1.4 DT_0.33
## [5] ggalluvial_0.12.5 ggplot2_3.5.1 tidyr_1.3.1 dplyr_1.1.4
##
## loaded via a namespace (and not attached):
## [1] sass_0.4.9 utf8_1.2.4 generics_0.1.3 digest_0.6.37
## [5] magrittr_2.0.3 evaluate_0.24.0 grid_4.4.1 fastmap_1.2.0
## [9] jsonlite_1.8.8 httr_1.4.7 purrr_1.0.2 fansi_1.0.6
## [13] viridisLite_0.4.2 crosstalk_1.2.1 scales_1.3.0 lazyeval_0.2.2
## [17] textshaping_1.0.1 jquerylib_0.1.4 cli_3.6.3 rlang_1.1.4
## [21] munsell_0.5.1 withr_3.0.1 cachem_1.1.0 yaml_2.3.10
## [25] tools_4.4.1 colorspace_2.1-1 vctrs_0.6.5 R6_2.5.1
## [29] lifecycle_1.0.4 htmlwidgets_1.6.4 ragg_1.4.0 pkgconfig_2.0.3
## [33] pillar_1.9.0 bslib_0.8.0 gtable_0.3.5 glue_1.7.0
## [37] data.table_1.16.0 systemfonts_1.2.3 xfun_0.47 tibble_3.2.1
## [41] tidyselect_1.2.1 highr_0.11 rstudioapi_0.17.1 knitr_1.48
## [45] farver_2.1.2 htmltools_0.5.8.1 rmarkdown_2.28 labeling_0.4.3
## [49] compiler_4.4.1