This page documents how the data behind this visualization was collected, cleaned, transformed, and aggregated. Every number shown in the dashboard can be traced back to the steps described below.
The primary dataset is the Base de Datos Histórica published by Comisión INGRESA — the government body that administers Chile's state-backed student loan program (CAE).
The file is a semicolon-delimited .txt export with
~12 million rows and 42 columns, one row per
student–year loan record, covering every CAE cohort from
2006 to 2024.
Key fields include: student identifier, registered gender, family region and income quintile, secondary-school dependency, institution name and type, program name, loan amounts requested and reference tuition, loan year, and beneficiary/outcome status.
The raw file was read using Python / pandas with latin1
encoding and PyArrow as the parsing engine.
arancel_solicitado and
arancel_referencia) contained non-numeric characters
(dots, commas, spaces) that were removed before casting to integer.Because the dataset spans 18 cohort years, nominal loan amounts are not directly comparable across time. Each record's requested tuition was multiplied by a year-specific cumulative inflation factor to express amounts in 2024 Chilean Pesos (CLP):
adjusted = arancel_solicitado × (1 + inflation_factor)
The factors were set so that 2024 = 1.0 (no adjustment). The 2006 factor is ~1.10, meaning a 2006 loan amount must grow by ~110 % to match its 2024 purchasing-power equivalent. Factors for each year between 2006 and 2023 decrease monotonically toward zero.
Dollar figures shown in the dashboard use a fixed reference exchange rate of 950 CLP per USD.
This single rate is applied uniformly to all years after inflation adjustment, so comparisons across cohorts reflect real purchasing power rather than historical exchange-rate fluctuations. It is an approximation intended for readability, not financial precision.
tipo_beneficiario = "NUEVO BENEFICIARIO", grouped by year,
region, income quintile, and gender.arancel_solicitado / arancel_referencia, restricted to
values in the range (0, 1] to exclude data-entry errors.SUM(egresos) / SUM(total_program_enrollments); counts
records whose final status is graduation (egreso).SUM(deserciones) / SUM(total_program_enrollments); counts
records whose final status is formal dropout (deserción).arancel_referencia)
rather than the actual tuition charged by each institution. Loan approval
rates were derived by comparing the amount requested
(arancel_solicitado) against the reference tuition, as
lower-income students tend to request a higher share of tuition costs yet
face lower approval rates.