Given the rapidly evolving structural socio-demographic determinants in Luxembourg (ageing, migrations, cross-border households) and social needs induced, the concern about the funding of the Luxembourg social security system is high on the agenda. This concern could become even more pressing over time, given Luxembourgs specificity in several respects. According to the 2024 Ageing Report of the EPC’s Ageing Working Group, Luxembourg might face an increase of age-related expenditure from 17.2% of GDP in 2022 to 27.9% in 2070, mostly due to pensions (+ 8.3% of GDP over the period). This is the biggest increase expected in EU-countries, a first particularity for Luxembourg. A second specificity is the importance of cross-border commuters in the Luxembourg economy. These represent 43% of total employment in 2022. Given such a context, the countrys social players are looking for avenues of reflection for future concrete proposals. This paper precisely aims to contribute to such a debate. It examines day-after impact of hypothetical parametric changes in social contributions and personal income taxes (the “alternatives”) on the distribution of household disposable income and total public financial receipts from these sources for Luxembourg. Moreover and given the importance of cross-border commuters for the country, we need a microsimulation modelling EUROMOD-based covering both residents and cross-border commuters’ households, the latter population involving an essential innovative extension to previous assessments. We emphasize the structural discrepancies between residents and cross-border households in terms of socio-economic status as well as regarding gross labor and taxable income. Consequently, we show that total receipts from residents are greater than those from cross-border households, even when controlling for population size. Next, we examine 42 alternatives based on the concerns of a key Luxembourg social partner in the context of an ongoing public debate. This examination takes into account the values achieved for a triplet of standard indicators chosen for their simplicity and acceptability to a broad public, and this within the framework of an independent external expertise : total revenues (cross-border households included), the inequality Gini coefficient and the poverty rate (the latter two for the resident population only). We then complete our detailed overview with an evaluation of each alternative according to the selected dimensions altogether. Finally, we evoke a basic “Global Performance Index” which may enlighten conclusions derived from a one or two-dimensional analysis. Although the analysis is specific to Luxembourg, the policy alternatives and methodology considered here may also be relevant for other countries with comparable socio-economic fundamentals, particularly in the context of the EU-wide concern regarding the fiscal implications of population ageing.
In the framework of the BEAMM project (BElgian Arithmetic Micro-simulation Model), we propose several methods to address data issues. The core of this project is to develop a tax-benefit microsimulation model for Belgium accessible online, requiring intensive data handling. Our challenges consist in creating a unified data set containing variables from different surveys and developing a completely synthetic database for the online development of the BEAMM platform.
Indeed, in the BEAMM context, we use a large number of variables available in different databases. We thus need to analyze data from different sources; the observations, which only share a subset of the variables, cannot always be paired to detect common individuals. This is the case, for example, when the information required to study a certain phenomenon comes from different sample surveys. Statistical matching is a common practice to combine these data sets. In this talk, we investigate and extend to statistical matching three methods based on Kernel Canonical Correlation Analysis (KCCA; [6]), Super-Organizing Map (Super-OM; [1]) and Autoencoders-Canonical Correlation Analysis (ACCA; [7]). These methods are designed to deal with various variable types, sampling weights and incompatibilities among categorical variables ([2, 3, 5]). We additionally implement methods for recalculating the sampling weights.
In our context, data privacy and anonymization are important. Under these circumstances, the need for synthetic databases that replicate the characteristics of the population while preserving privacy is arising. In this presentation, we also investigate how we can employ a range of data generation approaches utilizing various advancements in the Wasserstein Generative Adversarial Network (WGAN) literature to create survey databases. WGANs were introduced by Arjovsky 2017 ([8]) in the context of image synthesis. Our algorithms have been adjusted to account for sampling weights ([4, 5]). Moreover, survey and adminstrative data have the specificity of mixing continuous and categorical data, which should be taken into account in the architecture of the WGANs.
References [1] Kohonen, T. (1982), Self-organized formation of topologically correct feature map. Biological Cybernetics, 43 (1), 59–69. [2] Annoye, H., Beretta, A. and Heuchenne, C. (2024). Statistical matching using kernel canonical correlation analysis and super-organizing map. Expert Systems with Applications, 246, 123–134. [3] Annoye, H., Beretta, A. and Heuchenne, C. (2025). Statistical Matching using Autoencoders- Canonical Correlation Analysis, Kernel Canonical Correlation Analysis and Multi-output Multilayer Perceptron, Knowledge-Based Systems, 330,114626. [4] Annoye, H. and Heuchenne, C. (2025) Generating survey databases with Wasserstein Generative Adversarial Networks, Applied Intelligence, 55 (17), 1-17. [5] Annoye, H. (2024), Thesis: Statistical matching and data generation Prom. : Heuchenne, C.. [6] Lai, P. L. and Fyfe, C. (2000), Kernel and nonlinear canonical correlation analysis. International Journal of Neural Systems, 10 (05), 365–377. [7] Rumelhart, D. E., Hinton, G. E. and Williams, R. J. (1986), Learning Internal Representations by Error Propagation in Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Cambridge: MIT Press, 318–362. [8] Arjovsky, M., Chintala, S., and Bottou, L. (2017, July). Wasserstein generative
This paper introduces BIMic+, the labor supply extension of the tax and benefit microsimulation model of the Bank of Italy, BIMic (Curci, Savegnago and Cioffi, 2017). The model follows the Random Utility approach (McFadden, 1974; Aaberge, Dagsvik, and StrØm,1995; Van Soest, 1995). The model focuses on the labor supply behavior of wage earners and imputes wages for workers who are not employed through a two-step Heckman estimation procedure. The utility function departs from the quadratic functional form, which is common in this literature, to avoid decreasing utility in disposable income, a violation of a critical assumption in consumer theory and that underlies all redistributive analyses and is crucial for computing equivalent variations. The main arguments of the utility function are hours and disposable income. The latter is calculated through the static module, BIMic, for each counterfactual hours option. With respect to the literature, we innovate by: (i) matching the observed distribution of hours as a constraint into the optimization problem to avoid overfitting issues (as opposed to the usual approach of drawing taste shocks until the estimated hours match the observed ones). We do so in a way that also matches the distribution of labor income from aggregated tax returns. (ii) organizing the output of the model according to a strand of the public finance literature theoretically connected to optimal taxation. For each policy, we want to characterize the willingness to pay of beneficiaries and the net government cost, taking into account behavioral responses to the policy. We also propose to use these quantities to compute the marginal value of spending public funds in such a policy (Hendren and Sprung-Keyser, 2020; Bourguignon and Landais, 2022). In the last section of our paper, we simulate the labor supply effects of a policy reform as an illustration of how to use our model and its output; specifically, we focus on a cut in social security contribution for mothers with at least two children introduced in Italy in 2024.
Synthetic populations are essential for modeling complex systems that require individual-level data. However, they are typically limited to a single country. Creating a synthetic population across multiple countries is challenging because the data available from national statistical institutes are inconsistent: the variables available differ, and for shared variables, the categories may not align. Fortunately, Eurostat provides access to a large amount of aggregated socio-economic data that is consistent across EU countries. Although these data are less detailed than what can be obtained from national statistical institutes, they provide a solid basis for generating synthetic populations.
This work is part of MMUST+, an Interreg project developing a multimodal mobility model for the Luxembourg cross-border area using the synthetic population as input. We propose a multi-stage framework that combines iterative proportional fitting (IPF) (Deming and Stephan 1940) and stochastic synthetic reconstruction (Lenormand and Deffuant 2013) to generate synthetic populations that are statistically and structurally realistic.
The first stage consists of generating several entities: individuals, family nuclei, households, and dwellings. For each entity, we generate attributes covering socio-demographics, employment, education, household structure, dwelling information and spatial location. To generate these entities, we chose the IPF method for its efficiency and low algorithmic complexity. We ran a separate IPF instance for each entity using as many Eurostat marginals as possible. Since no survey includes all variables, we used a uniform seed with some structural zeros, but the multi-variable marginals preserve most dependency information. In the future, if survey data or microdata from statistical institutes become available, these could be used as seed for IPF to improve accuracy. Finally, we applied the truncate-replicate-sample (TRS) (Lovelace and Ballas 2013) integerisation to obtain integer counts.
In the second stage, dwelling attributes are assigned to households by probabilistically drawing dwellings for each household, with probabilities derived from IPF weights. If a dwelling and household are incompatible (different locations or mismatched number of occupants), the probability is set to zero.
The third stage addresses the assignment of individuals to family nuclei and the grouping of isolated individuals and family nuclei into households. We implemented the stochastic sample-free synthetic reconstruction algorithm described in (Lenormand and Deffuant 2013). The probabilities required by this method were computed using age-gap distributions derived from data available from Eurostat and the Human Fertility Database, in order to guide realistic relationships between partners and between parents and children. Hard constraints (e.g. maximum two parents per nucleus) were imposed by setting the corresponding probabilities to zero.
Preliminary validation demonstrates that the population reproduces key aggregate statistics, household structures, and family compositions across the cross-border region. While the current approach relies on aggregated data, future integration of survey microdata from national institutes could further improve accuracy.
Overall, this method offers a practical and flexible approach for generating synthetic populations. By combining IPF-based synthesis, TRS integerisation and stochastic synthetic reconstruction, it produces populations that are consistent with aggregate statistics, household- and individual-level structures.
We simulate the distributional effects of a €45/tCO2 carbon price on Belgian households’ heating and transport fuels using microdata from the 2016 Household Budget Survey. Without compensation, the policy is regressive and increases energy poverty, with especially large burdens for singles, seniors, and households heating with oil. We compare three revenue-recycling designs: equal transfers per household, equal transfers per capita, and a fuel-type-differentiated scheme that provides larger supplements to fossil-heated households. Per-household recycling protects vulnerable households better than per-capita recycling, which tends to undercompensate small households. Differentiating transfers by heating fuel further reduces large losses and within-income-group dispersion, and it prevents an increase in energy poverty while preserving overall progressivity of the reform.
Traditional monetary poverty metrics used in policy analysis have well-known limitations: small changes in thresholds or methodology can markedly alter who is counted as poor. Subjective poverty indicators—based on individuals’ own assessments—offer a complementary lens by capturing perceived deprivation.
This study uses Ecuador’s ENEMDU household survey (2009–2022), combining repeated cross-sections with a two-period panel spanning a major reform of the Bono de Desarrollo Humano (BDH) cash transfer program. In 2013–2014, a sharp tightening of the welfare index cutoff increased benefits for households below the threshold while making those just above it ineligible, generating an abrupt loss of transfers for some near-cutoff households. The panel allows us to track poverty dynamics around this shock.
We compare two objective poverty measures (income-based, using official poverty lines) with two subjective measures (self-reported poverty status and a minimum-income-based “subjective poverty line”). First, we document trends over time and test basic coherence, including whether higher income is associated with lower subjective poverty and how the subjective poverty line evolves. Second, exploiting the BDH reform as a quasi-experiment, we compare households just below and just above the eligibility cutoff to estimate—via a regression discontinuity design—how losing the transfer affects each poverty metric.
We expect objective and subjective measures to diverge in informative ways: some households above the monetary poverty line may still feel poor, while some income-poor households may not self-identify as such, reflecting adaptation and social comparison. We also hypothesize that subjective poverty is more responsive to the transfer loss than income poverty status. The results will clarify what each metric captures, whether different subjective measures behave similarly, and inform poverty targeting and social policy design by combining “poverty on paper” with perceived economic vulnerability.
Comparative pension microsimulation supports coherent cross-country policy analysis by us-ing harmonised inputs such as EU-SILC. Yet institutional heterogeneity and limited life-course information in standardised surveys constrain how credibly national rules, accrual mecha-nisms, and retirement pathways can be represented within a portable framework. Portability is feasible, but for most countries the feasibility frontier is set by what can be inferred from harmonised data and how much institutional detail can be included without sacrificing com-parability. We explore the validity domain of comparative modelling by benchmarking a comparative model against a detailed national model in a controlled “sister-model” setting. Our compar-ative model is microWELT, designed for multi-country applications and therefore built around portable representations of labour-market and retirement transitions and simplified pension benefit calculations. Retirement timing follows parsimonious rules centred on statutory ages, and benefits are approximated via mappings from pre-retirement earnings. microWELT is pa-rameterised for eight European countries and serves as a uniform base for refining national applications. We compare microWELT to microDEMS, a closely related detailed Austrian model built on longitudinal administrative data. microDEMS reconstructs employment and insurance careers and implements Austrian pension law at a granular level, including path-way-specific eligibility and benefit calculations that depend on accumulated insurance peri-ods and full contribution histories. This pairing allows us to attribute projection differences directly to data richness and institutional detail - rather than to unrelated modelling choices. Empirically, we use Austria’s reform harmonising women’s statutory retirement age with men’s as a demanding test case. Although the reform is simple to describe, it is difficult to capture with stylised comparative rules because it is phased in over time and interacts with multiple exit routes from the labour market whose availability depends on career histories. We run matched baseline and reform scenarios focussing on retirement transitions over the next decade under harmonised demographic and macro assumptions and compare age- and cohort-specific retirement and employment profiles as well as the timing of pension claiming and aggregate expenditure trajectories.
Dynamic microsimulation requires generating coherent multivariate micro-trajectories over time across multiple outcomes (e.g., employment, income, hours), while handling panel gaps and propagating uncertainty into downstream indicators. Common approaches—transition models, chained regressions, and hot-deck imputation—often yield a single deterministic completion and can struggle to preserve high-dimensional joint structure, especially under block nonresponse.
We propose a conditional diffusion approach for dynamic microsimulation in which each unit is represented as a multivariate monthly trajectory, conditioned on static covariates and an observation mask. The method borrows the same core mechanism that made diffusion models widely known through text-to-image systems such as Stable Diffusion: a sample is generated by starting from noise and repeatedly denoising until realistic structure emerges—images are simply a particularly dramatic, high-dimensional domain where this iterative reversal is easy to appreciate. Here we apply the same denoising principle to structured longitudinal microdata: a one-dimensional residual convolutional denoiser with timestep and month positional embeddings learns to reverse a gradual Gaussian corruption process on trajectories, so observed months are pinned while missing months (or, via one-sided masking, future months) are generated as multiple plausible, jointly coherent completions consistent with the observed history and covariates.
We demonstrate the method on a large household panel dataset as a case study (SIPP). We evaluate the model under both random missingness and contiguous block gaps, and assess not only point error but dynamic and joint realism, including unemployment spell-length distributions, employment transition matrices, and income distributions conditional on employment status. A key benefit is multiple imputation: repeated sampling yields a distribution over plausible completions and uncertainty bands for downstream statistics, allowing the model to correctly express uncertainty instead of returning a single deterministic trajectory. The width of these bands reflects how identifiable missing months are given observed history and covariates; poor empirical coverage provides a diagnostic for missing predictors or miscalibration. We further demonstrate one-sided masking as a forecasting/nowcasting use case, and scenario-style constrained sampling for stress-testing counterfactual assumptions (e.g., income floors or top-ups), while noting that causal policy inference requires additional identification assumptions.
Overall, conditional diffusion offers a flexible, uncertainty-aware generative layer for microsimulation that can preserve multivariate temporal structure and support robust uncertainty propagation.
Microsimulation techniques are widely used to study the impact of (reforms in) fiscal and social policies in mature welfare states, providing interesting insights into how today policies reduce poverty, redistribute resources among population groups and shape incentives for instance to work or save. While a lot has been written on the historical emergence and expansion of welfare states on an institutional level, hardly anything is known about the impact early-stage welfare states had on the lives of ordinary people. Most European countries saw the emergence and rapid expansion of their welfare states in the three decades after the Second World War, usually referred to as the Golden Age. It is often inferred that this must have been a period in which everyone was better off, in large part attributed to strong social protection. Yet, the empirical evidence is lacking as the literature typically focuses on the top 10% and the role of top marginal tax rates.
In this paper we present for the first time preliminary results from a project that develops a historical microsimulation model for the Netherlands, a particularly interesting case as it went from being one of the lowest to one of the highest spending welfare states. In practice, we simulate cash social transfers, social insurance contributions and the personal income tax for several years between 1950 and 1975 based on official legislative documents. For the present paper the model will be combined with hypothetical household data in order to analyse key policy indicators for household types differing along socio-demographic characteristics and income levels. Such an analysis provides valuable insights into the intentions of historical policymakers. In a later stage the aim is to combine the microsimulation model with representative microdata of the Dutch population in order to study policy outcomes such as redistribution and poverty reduction and to contrast outcomes with intentions.
This paper examines the interplay between child contingent income support and out of pocket (OOP) childcare costs in four European countries—Belgium, Poland, Spain, and Sweden. While existing research has extensively analysed cash benefits and early childhood education and care (ECEC) services separately, considerably less is known about how these policies jointly shape families’ income adequacy and labour market participation. Using EUROMOD, enriched with detailed information with regard to legislation on childcare fees, we introduce a novel indicator—the compensation ratio—which captures the degree to which child contingent benefits offset OOP childcare expenses.
Across countries, the compensation ratio reveals distinct income related patterns. In Poland and Sweden, benefits generally exceed OOP childcare costs across most of the income distribution, reflecting strong low income targeting. In Belgium, the compensation ratio is above one only for lower income families, declining sharply with income as childcare fees increase more steeply than benefits. Spain shows a similar but more moderate pattern, with low income families roughly compensated and higher income families receiving insufficient support relative to childcare costs.
Overall, our findings demonstrate that the interaction between childcare fees and child related income support substantially shapes the affordability of childrearing and, by extension, families’ capacity to undergo employment transitions. As the compensation ratio declines with income in several countries, our results suggest that these policy designs may inadvertently create labour market disincentives. The analysis underscores the need for conjoint, rather than isolated, assessment of family policy measures in European welfare states.