Tool Development

Home /
Tags /
Tool Development

Firm microsimulation and VAT policy analysis

I am a Research Associate at PolicyEngine, a nonprofit that provides free, open-source software to compute the impact of public policy in the US and UK. Previously, I served as a researcher at the London School of Economics. My work focuses on microsimulation, economic modelling, and public policy analysis, particularly the UK tax and benefit system.

From Microsimulation to a Digital Twin of Society: Methodological and Data Foundations of project InnoTwin

Digital twins are increasingly regarded as a key technology for analysing complex systems. While the concept is well established in engineering and industry, its transfer to societal systems remains methodologically underdeveloped. This presentation discusses, using our BMFTR-founded project InnoTwin (www.innotwin.de) as a case study, what it scientifically means to construct a digital twin of society, and how this approach differs from classical microsimulation. InnoTwin is based on an agent-based model that explicitly represents individual life courses, behavioural responses, and social interactions, thereby moving beyond the analysis of average effects.

INFORM2, DWP’s main forecasting model for Universal Credit

INFORM2 is a dynamic microsimulation model developed within the Department for Work and Pensions (DWP) for forecasting claim volumes that underpin the benefit expenditure forecast for Universal Credit. Development began in 2018 and builds on earlier iterations of the INFORM framework. The model simulates independent benefit units and individuals on a monthly time step, using Universal Credit (UC) administrative data as its core input.

Introducing the comparative open-source model microWELT

This tutorial introduces the microWELT model and modular modelling platform for comparative dynamic microsimulation. MicroWELT is a portable, continuous time interacting population model built to work with readily available data for many countries, and it supports optional alignment to aggregate targets. It is “X-compatible”: the same model code can be compiled using Modgen or the open-source openM++ environment. As documented on the project website www.microWELT.eu, the model is also extendable to refined national applications such as the microDEMS model, which applies the same platform to an Austrian setting using detailed longitudinal administrative records, illustrating how the shared core can be refined when richer data are available.

Microsimulation at Scale for Chronic Disease Modelling: Executing 100 Million Individual Life-Course Simulations in 100 Seconds

Microsimulation is a uniquely powerful technique for chronic disease modelling because it simulates outcomes at the level of the individual over time, capturing heterogeneity, history-dependent progression, multimorbidity, and complex clinical pathways that cohort averages cannot. In an era when chronic diseases account for the majority of global mortality and impose escalating pressure on health systems, decisions about their prevention, treatment, pricing, and resource allocation carry profound long-term clinical and financial consequences. Consequently, accurate long-horizon modelling of these diseases has become central to policy, reimbursement, and investment decisions. Historically, however, microsimulation has been constrained by computational performance. Statistical precision requires large, simulated populations to reduce Monte Carlo error, and probabilistic sensitivity analysis multiplies this burden through repeated parameter sampling. Many models built in spreadsheets or high-level languages require hours or days to run, limiting scenario exploration, delaying iteration, and reducing their practical utility in time-sensitive decision environments. To address these limitations, a legacy microsimulation stack was rebuilt into a high-performance platform capable of executing 100 million life-course simulations in approximately 100 seconds. Performance gains were achieved through several core engineering innovations. The microsimulation core was implemented in modern C++, enabling direct control over memory allocation, cache locality, and execution flow. Compared with interpreted (e.g. Python, R) or spreadsheet-based environments, compiled C++ dramatically reduces runtime overhead and enables predictable, deterministic execution, strengthening validation processes and supporting regulatory-grade transparency and auditability. Memory architecture was optimised to maximise Central Processing Unit (CPU) cache efficiency and minimise allocation costs. Modelled individuals’ attributes, state transitions, and event processes were encoded in compact, structured formats, allowing large virtual populations to be simulated without performance degradation. The engine exploited modern multi-core CPU architectures through multi-threading, allowing independent patient simulations to run concurrently. Because individual life trajectories are largely independent within Monte Carlo microsimulation, the model parallelises naturally, enabling near-linear scaling with available cores. Beyond single-machine performance, the system supports horizontal scaling via containerised simulation instances, allowing elastic expansion across the infrastructure based on workload demand, without reliance on specialised high-performance computing clusters. The platform includes integrated pipelines for data ingestion, preprocessing, simulation execution, and post-processing. Outputs are automatically aggregated into epidemiological, and economic metrics, including incidence, prevalence, costs, and healthcare resource use outcomes, ready for decision analysis. A user-facing interface abstracts technical complexity, allowing domain experts to configure scenarios and execute simulations without interacting directly with the code or infrastructure. The entire platform is securely hosted in the cloud, allowing for easy set up and access anywhere in the world. The system is comprised of cross-cloud components that allow it to be hosted in any of the major cloud providers. These advances represent a fundamental shift in capability: complex simulations once requiring hours or days may now be completed in seconds, enabling real-time exploration of uncertainty, and rapid scenario iteration to expedite decision-making. Microsimulation can therefore operate at the scale and speed demanded by modern policy, reimbursement, and investment strategies, amid growing chronic disease complexity and multimorbidity.

MIDAS DE – A LIAM2 based dynamic microsimulation of German pension incomes using linked RV–SOEP data

We introduce MIDAS DE, a LIAM2 based microsimulation model for analysing German pension incomes under current law and counterfactual policy scenarios. The model reproduces the statutory formula for earnings point accrual, access and type factors, and the current pension value, and is designed to evaluate distributional, gender, and adequacy effects of reforms such as pension splitting and survivor benefit adjustments within a unified framework. MIDAS DE is implemented in LIAM2 using discrete time processes over entities (individuals, households), typed fields (e.g., insured status, pension points), and explicit links (spouse/partner, parent–child) necessary for survivor pensions and splitting eligibility. The model combines SOEP RV administrative insurance records with SOEP survey microdata. Linkage relies on rv_id (SOEP RV↔SOEP) and pid to reconstruct households and partnerships from ppath/ppathl, household matrices from pbrutto/pl, and family histories from biofam/biomars. This enables (i) identification of spouses; (ii) retrieval of pension relevant histories for groups under represented in DRV (e.g., civil servants, self employed) via biowork/biojob; and (iii) construction of household attributes needed for survivor benefit means tests. To harmonise labour income for accrual, we estimate gender and occupation specific Heckman selection models for three groups—salaried employees, self employed, and civil servants—ensuring segment specific participation mechanisms and wage processes. Predictions are selection corrected (inverse Mills ratio) and back transformed with lognormal adjustments; observed wages replace predictions when available. This captures institutional heterogeneity (e.g., civil service pay scales, self employment volatility) and mitigates bias from missing or misreported earnings, feeding consistent contributory bases into earnings point calculations. Robustness checks consider exclusion restrictions (household composition and partner status), outlier trimming, and alternative retransformation (smearing). Technically, MIDAS DE shows how LIAM2 can host a law consistent German pension engine calibrated on linked RV–SOEP microdata with explicit household links, enabling faithful simulation of survivor pensions, pension splitting, and income offsets. Substantively, the model structures policy scenarios along current law splitting, VersAusglG style variants (with and without 25 year conditions and cross pillar coverage), and a universal splitting regime, providing outcomes on the gender pension gap, poverty at retirement, and fiscal effects.

Properties of alignment methods in discrete time dynamic microsimulation models

Alignment is a critical calibration technique in microsimulation, ensuring individual-level transitions aggregate to known macro-targets. While indispensable for updating populations to match demographic projections or macroeconomic forecasts, the statistical properties of various alignment algorithms remain under-researched. This paper provides a systematic evaluation of alignment methods for discrete-time models to guide researchers in method selection.

Tutorial session: Analysing tax-benefit reform impacts with PolicyEngine

What do you want to teach? This hands-on tutorial introduces participants to PolicyEngine, a free, open-source microsimulation platform for analysing tax and benefit policy reforms in the US and UK. Participants will learn to use PolicyEngines web interface (policyengine.org) to: (1) model a tax or benefit reform by adjusting policy parameters, (2) compute hypothetical household impacts showing how the reform affects a hypothetical households taxes, benefits, and net income, (3) run population-level microsimulation analysis to estimate budgetary cost or revenue, distributional effects across income deciles, poverty impacts, and winner/loser breakdowns, and (4) use PolicyEngines AI assistant (a Claude Code plugin) to conduct policy analysis from natural language prompts, including generating charts, policy briefs, and congressional district or constituency-level breakdowns. The session will use live examples relevant to current policy debates in both the US and UK.