Book of Abstracts

Home /
Book of abstracts

Ralf Münnich ( University of Trier ) — “Digital twins: challenges, pitfalls, and opportunities”

July 1, 2026, 9:00 am P61 Workshop on synthetic data

Li and O’Donoghue (2013) emphasized microsimulations to cover two areas, the microsimulations per se in terms of what-if-questions as well as synthetic data generation as an important base for performing microsimulations. More and more methods such as data fusion of different surveys, prediction methods, as well as modern ML approaches are applied. However, modelling strategies need to be adjusted accordingly, in particular depending on cross-sectional or longitudinal applications. Further, the increasing attention is laid on the granularity of the modelling. All in all, little attention is laid on the accuracy of the generated data as well as on assumptions and implicit decisions of developers of microsimulation models. The presentation focuses on different aspects of synthetic data generation and so-called digital twins. Special attention will be laid on timely and regional granularity as well as of unobserved heterogeneities of the simulations including uncertainties of the entire modelling process. Additionally, specific data situations and disclosure limitations will be addressed.

Cédric Heuchenne ( CAPE - UCLouvain Saint-Louis ) — “Enhanced data fusion and anonymization for microsimulation systems”

July 1, 2026, 9:00 am P61 Workshop on synthetic data

Workshop, • Synthetic data ,

The fusion and anonymization of multiple heterogeneous data sources remain major challenges in applied statistics. In this work, we consider the joint use of demographic and fiscal census data together with several sample surveys. The objective is to integrate these sources in order to obtain a coherent representation of the overall population and to enable the evaluation of policy changes, such as reforms of the fiscal system, while ensuring that the resulting data are fully synthetic and thus completely anonymized. We present a modeling framework for merging data sets that share the same type of statistical units (e.g., households), and we show how this framework can be enhanced by incorporating information from data sets defined on different units (e.g., individuals). We also address the issue of harmonizing surveys that rely on distinct sampling designs. The proposed approach leads to a fully anonymized synthetic data set that preserves the main statistical properties of the original data and can be directly used for analysis by end users.

Generating Synthetic Populations for Transportation: A Variational Autoencoder Approach

Pierre-Olivier Vandanjon ( Université Gustave Eiffel ) — “Generating Synthetic Populations for Transportation: A Variational Autoencoder Approach” (joint work with: Abdoul Razac Sané; Pierre Hankach; Rachid Belaroussi; Pascal Gastineau)

July 1, 2026, 9:00 am P61 Workshop on synthetic data

Workshop, • Synthetic data ,

Synthetic populations are commonly used in transportation analysis to feed traffic simulators. Recently, they have also been used to assess the sensitivity of a territory to factors such as construction noise. However, traditional methods as Iteratif Proportianal Fitting (IPF) for generating synthetic populations, based on sampling and calibration to aggregated data, have limitations. Indeed, they only allow generating individuals similar to those in the initial sample. Machine Learning and Statistical Learning methods, such as Variational Autoencoders (VAE), offer a promising alternative. VAE have already demonstrated their effectiveness in generating realistic images. We present here how to use VAE to generate synthetic populations, allowing for more varied representations of a territory.

Book of Abstracts

Categories

Tags