Validation & Methods

Home /
Tags /
Validation & Methods

A Novel Weighting-Based Approach to Cohort Replenishment in Dynamic Microsimulations

We propose a new method for generating replenishment cohorts in dynamic microsimulation models. Standard dynamic microsimulations project the future states of an initial population through the recursive application of one-step-ahead predictions. Over time, sample size declines due to attrition (e.g., mortality), and without the integration of new individuals, the projected population progressively departs from the target population structure. To preserve representativeness, replenishment cohorts must therefore be introduced at each simulation step.

A venue-based population-wide individual-based microsimulation model for COVID-19 transmission

Understanding infectious disease transmission requires insight into who interacts with whom, where and how these interactions take place, and under which conditions. While individual-based models (IBMs) allow interactions to be represented at the level of individuals, most models aggregate information on interaction partners (e.g. by age), without specifying where contacts occur, how they take place, or which individuals are co-present in the same setting. As a result, interactions outside households, schools or workplaces are commonly represented using aggregated community structures, and detailed mobility or venue-level contact data are rarely available. This poses a key challenge for microsimulation-based transmission modelling.

Comparative Pension Microsimulation using microWELT: Lessons from Benchmarking to a Detailed National Model

Comparative pension microsimulation supports coherent cross-country policy analysis by us-ing harmonised inputs such as EU-SILC. Yet institutional heterogeneity and limited life-course information in standardised surveys constrain how credibly national rules, accrual mecha-nisms, and retirement pathways can be represented within a portable framework. Portability is feasible, but for most countries the feasibility frontier is set by what can be inferred from harmonised data and how much institutional detail can be included without sacrificing com-parability. We explore the validity domain of comparative modelling by benchmarking a comparative model against a detailed national model in a controlled “sister-model” setting. Our compar-ative model is microWELT, designed for multi-country applications and therefore built around portable representations of labour-market and retirement transitions and simplified pension benefit calculations. Retirement timing follows parsimonious rules centred on statutory ages, and benefits are approximated via mappings from pre-retirement earnings. microWELT is pa-rameterised for eight European countries and serves as a uniform base for refining national applications. We compare microWELT to microDEMS, a closely related detailed Austrian model built on longitudinal administrative data. microDEMS reconstructs employment and insurance careers and implements Austrian pension law at a granular level, including path-way-specific eligibility and benefit calculations that depend on accumulated insurance peri-ods and full contribution histories. This pairing allows us to attribute projection differences directly to data richness and institutional detail - rather than to unrelated modelling choices. Empirically, we use Austria’s reform harmonising women’s statutory retirement age with men’s as a demanding test case. Although the reform is simple to describe, it is difficult to capture with stylised comparative rules because it is phased in over time and interacts with multiple exit routes from the labour market whose availability depends on career histories. We run matched baseline and reform scenarios focussing on retirement transitions over the next decade under harmonised demographic and macro assumptions and compare age- and cohort-specific retirement and employment profiles as well as the timing of pension claiming and aggregate expenditure trajectories.

Conditional diffusion for uncertainty-aware dynamic microsimulation: multivariate trajectory inpainting, forecasting, and scenario generation

Dynamic microsimulation requires generating coherent multivariate micro-trajectories over time across multiple outcomes (e.g., employment, income, hours), while handling panel gaps and propagating uncertainty into downstream indicators. Common approaches—transition models, chained regressions, and hot-deck imputation—often yield a single deterministic completion and can struggle to preserve high-dimensional joint structure, especially under block nonresponse.

Constructing a historic microsimulation model to measure tax-benefit policy intentions in the Netherlands: 1950-1975

Microsimulation techniques are widely used to study the impact of (reforms in) fiscal and social policies in mature welfare states, providing interesting insights into how today policies reduce poverty, redistribute resources among population groups and shape incentives for instance to work or save. While a lot has been written on the historical emergence and expansion of welfare states on an institutional level, hardly anything is known about the impact early-stage welfare states had on the lives of ordinary people. Most European countries saw the emergence and rapid expansion of their welfare states in the three decades after the Second World War, usually referred to as the Golden Age. It is often inferred that this must have been a period in which everyone was better off, in large part attributed to strong social protection. Yet, the empirical evidence is lacking as the literature typically focuses on the top 10% and the role of top marginal tax rates.

Continuous-Time Labour Activity Transitions in Comparative Microsimulation: Alignment, Validation, and Benchmarking

We present a continuous-time longitudinal microsimulation module for labour activity careers in microWELT, a comparative microsimulation model for socio-demographic projections and policy scenario analysis. Individual careers evolve through transitions among employment, unemployment, non-participation, retirement, and family-related leave using an event-driven waiting-time approach. Transition processes are modelled with piecewise-constant hazard regressions featuring competing risks and explicit duration dependence, enabling realistic spell dynamics - especially for unemployment - that are often difficult to reproduce in discrete-time transition models. A central design feature is alignment to scenario targets for aggregate unemployment and labour force participation. Because microWELT, like most policy microsimulations, do not model market mechanisms, these targets provide macro-level closure for projections and a transparent instrument for counterfactual scenario design. Estimating transition hazards in microWELT is challenging because the underlying comparative survey data typically have limited longitudinal depth and small effective sample sizes, implying higher parameter uncertainty than administrative-history models such as Austria’s microDEMS. To address this, we adapt the microDEMS hazard-based framework and implement two alignment mechanisms. First, simulated aggregates are calibrated to scenario targets for overall unemployment and participation. Second, an optional but pivotal mechanism for microWELT constructs group-specific targets using cross-sectional imputation models defined by categorical characteristics (age group, gender, education, health, and family status). These group targets are reconciled with aggregate constraints via monthly calibration of the cross-sectional model during the simulation. Operationally, alignment exploits simulated waiting times for selected transitions - entries into unemployment and exits from the labour force - to rank individuals by implied event timing and then induces the required number of events by selecting those with the shortest waiting times to meet target margins. All other transitions remain fully continuous-time; only alignment-induced events are implemented on monthly steps to enforce cross-sectional constraints. We evaluate the framework through benchmarking and retrospective validation, comparing careers generated from hazards estimated on administrative versus survey data and assessing fit to recent observed outcomes. Validation focuses on the distribution of unemployment spell durations and heterogeneity in unemployment risk across population groups, quantifying how alignment delivers target consistency while preserving plausible continuous-time dynamics and empirically observed group differences.

Developing Reporting Standards for Population Health Microsimulation: A Scoping Review of Current Practices

Background Population health simulation models—including microsimulation, agent-based, system dynamics, and Markov models—are essential tools for understanding noncommunicable disease (NCD) burden and evaluating policy interventions. However, inconsistent reporting practices limit the transparency, reproducibility, and credibility of this work. Unlike clinical trials and observational studies, which benefit from established reporting guidelines (CONSORT, STROBE), no comprehensive standards exist for population health simulation models.

Evaluating the results of a social benefit simulation using individual administrative data on benefit receipt

The simulation of social benefits is an important application of tax-benefit microsimulation models in social policy research. Simulation outcomes inform about the (potential) effects of social policies and policy reforms. Furthermore, tax-benefit simulations allow for an evaluation of the interactions of different benefits in complex benefit systems. However, the quality of the simulation outcomes has consequences for the assessment of the effectiveness of the policy. An increasing number of recent studies on benefit non-take-up as one measure for the ineffectiveness of social policies explicitly address the difficulties in determining benefit entitlements using tax-benefit microsimulation models (Tasseva 2016, Bruckmeier et al. 2021, Doorley and Kakoulidou 2024, Bargain et al. 2012, Harnisch 2019). Consequently, the validation of the simulation outcomes is an important step in the application of tax-benefit simulations. Our study contributes to the literature on validating the results of tax-benefit simulation models. We examine how well the results of an open-source tax-benefit microsimulation model for Germany (GETTSIM) on means-tested minimum income (UBII) entitlements match the benefits contained in administrative data on UBII. Our analysis has two objectives: First, the results should provide an assessment of the validity of the UBII simulation results using GETTSIM. Second, generalized conclusions for policy and non-take-up analyses based on tax-benefit microsimulation models will be drawn. The results show that UBII entitlements are in most cases correctly simulated. In an adapted version of GETTSIM we have used, only for 3 to 4 percent of all observations no UBII entitlement was simulated (beta error). The simulation also allowed a precise distinction between UBII and existing similar benefits (housing and means-tested child benefit) for most observations. A closer look at individual deviations between recorded and simulated entitlements reveals significant deviations for migrants, especially from crisis countries, which was particularly relevant in 2017 and 2018. Furthermore, for households with many family members, with children or employed persons, the simulation at the individual becomes less precise. The results also provide some insights for the analysis of eligibility based on tax-benefit simulations in general. In social policy systems with overlapping benefits, even with very good data quality, misspecification of benefit entitlements cannot be avoided to a relevant extent, especially when the benefits pursue similar objectives and discretionary decisions occur at the administrative level. Since the mean values of various large sociodemographic groups are relatively accurately determined in the simulation, calibrating the simulated recipient numbers can compensate for these inaccuracies. The analysis at the individual level has shown that simulation quality decreases particularly for subgroups with more complex life circumstances, such as households with children. This applies in particular to comprehensive last-resort minimum income systems that provide benefits in the household context and take all types of household income into account. Temporary special circumstances, like a national or global crisis, can also lead to simulation results that do not reflect actual payments. In crisis years, consideration should be given to excluding certain particularly affected subgroups from the analysis or to choose other simulation years, if possible.

Machine Learning Approaches to Predicting Consumption Expenditure: A Comparative Analysis for SILC–HBS Statistical Matching

This study examines whether supervised machine learning can improve the prediction of household expenditure shares within the standard statistical matching pipeline that fuses EU SILC–type microdata with Household Budget Survey (HBS) expenditures. The conventional approach uses a transparent two part econometric design: a probit model for participation (extensive margin) and an OLS regression for conditional spending (intensive margin). While robust, this framework is known to struggle in categories with pronounced zero inflation, nonlinear participation boundaries, heterogeneous spending patterns, or timing noise. We assess whether replacing the parametric steps with Gradient Boosted Trees (GBT) for participation and Gradient Boosted Regression (GBR) for conditional expenditure yields systematically better predictions without altering the downstream imputation workflow. We combine Swiss SILC 2020 as the recipient dataset and Swiss HBS 2015–2017 as the donor survey. Because these samples have no shared identifiers, we harmonize variables following established Eurostat/JRC practices. Seventeen covariates present in both sources are aligned through recoding and aggregation, and we uprate nominal incomes and expenditures using the harmonized index of consumer prices (HICP) to ensure comparability with the SILC reference year. We apply EUROMOD style categorical aggregation to mitigate incidental zeros, remove extreme expenditure to income ratios, and enforce a common structure for the predictors used in both stages of the model. This creates a coherent evaluation environment in which alternative prediction models can be compared fairly. The imputation pipeline remains unchanged to ensure comparability with policy applications. First, we estimate participation for each aggregated COICOP category using the selected model (probit baseline or GBT alternative). Second, we model conditional expenditure given participation using OLS (baseline) or GBR (alternative). Third, we compute fitted shares and apply a pseudo R² screen to restrict attention to categories where covariates meaningfully explain variation. All diagnostics and matching steps are identical across methods so that any downstream differences are attributable solely to the prediction component. The design yields (i) cross validated probability and error metrics for extensive and intensive margins; (ii) threshold sweep summaries to document operating point sensitivity under imbalance; and (iii) downstream compatibility with the standard donor selection step used in EUROMOD/SWISSMOD type applications. Because the imputation workflow and diagnostics are held constant, the study isolates the contribution of flexible predictors relative to the classical probit–OLS baseline in a way that is transparent for policy use.

Modelling Social Assistance Take-Up with Machine Learning in Slovakia

The Material Need Benefit (MNB) constitutes the core component of the Slovak social assistance system. It is a means-tested transfer whose eligibility depends on household composition, income thresholds, and asset tests. In the TATRASK model, which is a static microsimulation model of the Slovak tax and transfer system based on linked administrative data collected by government authorities, the MNB is simulated through the application of legislative rules subject to data availability constraints. As is common in microsimulation models of this type, both the number of simulated beneficiaries and the aggregate fiscal cost are substantially overestimated. This limitation stems primarily from the inability to model benefit take-up accurately due to missing or unobserved information in the data. To address this issue, first an expert-based adjustment approach is implemented, where the number of potential beneficiaries is reduced through a set of rules reflecting observed behavioural patterns in the data. While this method improves model performance and reduces overestimation, it remains limited in its ability to fully capture take-up behaviour. As an alternative, this contribution applies machine learning techniques to model MNB take-up. Specifically, XGBoost models are trained to predict the probability of benefit receipt and to identify likely beneficiaries within the underlying dataset. The results demonstrate that the machine learning approach clearly and substantially outperforms both the baseline TATRASK simulation and the expert rule-based adjustment in terms of accuracy. Moreover, cross-temporal validation confirms the robustness and stability of the ML model, even in the presence of policy changes affecting MNB eligibility.

Nowcasting the BELMOD input dataset: Comparing techniques for an administrative dataset

The BELMOD microsimulation model of the Belgian Federal Public Service Social Security is based on an administrative input dataset. While the model’s policy simulations are updated twice a year, the most recent input dataset currently available refers to 2019. As a result, the discrepancy between the input data and the present situation has grown to several years. This gap can be reduced through nowcasting methods, by updating the outdated input data with more recent information to bring it more in line with the present situation, thereby making the data more suitable for simulations relating to the most recent years. We applied nowcasting to the BELMOD input dataset by incorporating both demographic changes and changes in individuals’ labour market status. To assess which nowcasting approach is most suitable for BELMOD, we developed and compared three different methods. In the first two methods, labour market status transitions are modelled using, respectively, a parametric and a non-parametric approach. For individuals experiencing a labour market status transition, the relevant variables in the dataset are subsequently adjusted. In addition, demographic changes are incorporated in these two methods through reweighting. In the third method, both labour market status and demographic changes are implemented through reweighting. We validated these three methods using external statistics on the number of income and benefit recipients, as well as aggregate income and benefit amounts.

Prediction Markets in Decentralized Finance: An Agent-Based Model with Heterogeneous Traders

While prediction markets in decentralized finance aggregate dispersed information into probabilistic prices, their accuracy strongly dependents on design, incentives, and environment. Under suitable incentives, markets can theoretically approximate the likelihood of events (Wolfers & Zitzewitz, 2004). However, recent agent-based research suggests that in prediction markets with order-based matching, biased traders with large capital endowments (so-called “whales”) can move prices away from fundamentals. Such distortions may persist when updating beliefs is slow and herding reinforces such deviations (Smart et al., 2026). Such distortion dynamics thus depend not only on trader heterogeneity but also on the mechanisms translating demand into prices and spreads.

Properties of alignment methods in discrete time dynamic microsimulation models

Alignment is a critical calibration technique in microsimulation, ensuring individual-level transitions aggregate to known macro-targets. While indispensable for updating populations to match demographic projections or macroeconomic forecasts, the statistical properties of various alignment algorithms remain under-researched. This paper provides a systematic evaluation of alignment methods for discrete-time models to guide researchers in method selection.

Recent developments of the SimPaths dynamic microsimulation framework

SimPaths is an open-source framework for modelling individual and household life course events, jointly developed at the Centre for Microsimulation and Policy Analysis and the University of Glasgow (Bronka et al., 2025). The framework is designed to project life histories through time, building up a detailed picture of career paths, family (inter)relations, health, and financial circumstances. The modular nature of the SimPaths framework is designed to facilitate analysis of alternative assumptions concerning the tax and benefit system, sensitivity to parameter estimates and alternative approaches for projecting labour/leisure and consumption/savings decisions. SimPaths builds upon standardised assumptions and data sources, which facilitates adaptation to alternative countries.

Reconstructing Interstate Conflict Networks - An Agent-Based Model Anchored in HIIK Data

Interstate conflict networks are frequently represented as mappings of dyadic feature combinations, which can be represented as graph-based networks. Such graphs encode structured macro-patterns, including but not limited to positions, clusters, and polarization, that require mechanistic interpretations and explanation. The research underlying this abstract develops an agent-based model (ABM) to reconstruct annual interstate conflict networks derived from the Conflict Barometer of the Heidelberg Institute for International Conflict Research (HIIK) (HIIK, various years). Rather than treating these networks as graphs and illustrations, we utilize them as empirical macro-targets for simulation-based explanation in the tradition of generative social science (Epstein, 2006).

Simulating the Simulation: Evaluating Simulation Strategies in Causal Inference

Because causal estimands are unobservable in reality, benchmarking causal effect estimators is inherently challenging. To overcome this, simulation studies are increasingly used to evaluate causal inference methods under realistic conditions such as small sample sizes, limited overlap, and complex confounding. Yet it remains unclear to what extent conclusions drawn from simulated settings extrapolate to real-world performance. Existing benchmarks rely on a wide range of outcome-generation strategies, from parametric structural models to flexible machine learning, without clear guidance on their reliability. This paper formalizes simulation-based benchmarking and introduces a framework to characterize the types of bias that may arise from different generative strategies, focusing particularly on Average Treatment Effect (ATE). We classify and compare several approaches proposed in the causal inference literature, including parametric structural models[1], machine-learning conditional mean–variance models[2], and adversarial generative models such as Wasserstein GANs[3]. We apply these modeling strategies to commonly used benchmark datasets in which we synthetically define an outcome-generating function so that the true ATE is known and systematic evaluation is possible. Across Monte Carlo experiments, we compare estimator performance in terms of bias, variance, and mean squared error. We examine how closely these performance metrics align across different simulation strategies relative to the structural data-generating process, and whether commonly used generator diagnostics are predictive of estimator behavior. Our preliminary results indicate that the choice of simulation strategy can substantially alter conclusions about estimator reliability relative to the underlying data distribution. These findings underscore the importance of principled design and validation of simulation frameworks when benchmarking causal methods.

Strengthening Validation Frameworks in Dynamic Microsimulation: Evidence from SimPaths

Dynamic microsimulation models such as SimPaths are increasingly used to evaluate long-term policy impacts by generating synthetic trajectories for individuals and households. Their credibility, however, depends on rigorous validation: demonstrating that simulated outcomes can reliably reproduce observed data. Despite their growing role in policy analysis, validation practices remain fragmented and only partially automated (e.g., O’Donoghue et al., 2015; Gosseries & Van der Heyden, 2018). This paper presents ongoing work on strengthening validation frameworks in SimPaths, with a focus on discriminator-based methods and econometric consistency checks. First, we apply classifiers (e.g., Gradient Boosted Machines) to distinguish between simulated and survey data. Discriminator accuracy provides an interpretable quantitative score of similarity: the closer performance is to random guessing, the more realistic the simulated data. Second, we explore the re-estimation of key behavioural regressions using simulated data and assess parameter recovery. This helps identify whether discrepancies arise from implementation issues, estimation limitations, or structural differences between datasets. By combining machine-learning discriminators with regression-based diagnostics, the paper contributes to more automated, transparent, and reproducible validation practices for complex microsimulation models.

Uncertainty assessment in dynamic microsimulation: the case of MikroSim (Germany)

Spatial dynamic microsimulations probabilistically project geographically referenced units with individual characteristics over time. Like any stochastic projection method, their outcomes are inherently uncertain and sensitive to multiple factors. In discrete time dynamic microsimulations, for each simulated time step (often years), each unit passes through different modules addressing different life events (births, deaths, ageing, partnership, employment, …), evaluating if a status transition occurs for them via a Monte Carlo experiment. This inherently introduces uncertainty due to the methods stochastic nature. However, simulations may also be sensitive to other factors, such as the choice of model types and complexity, as well as the parameter estimations, among others.

Understanding Dynamics in Security Awareness through ICT Diffusion using Microsimulation

With the ongoing diffusion of information and communication technologies (ICT) across large parts of Europe, security awareness is becoming increasingly important for mitigating societal security risks and long-term costs of cybercrime. At the macro-structural level, the diffusion of ICT is to a large extent shaped by institutional factors, such as legislative decisions at the national level, technological innovations, or considerations within larger organizational units. By contrast, macro-structural changes in security awareness cannot be governed in a comparable top-down manner. Instead, micro-founded models at the individual level are required, whose contextual conditions systematically change as a result of social-structural and demographic transformation processes.