Graphical representations with ggplot2

From theory to practice on microdata from the Study of Family and Intergenerational Relations (ERFI1) survey

For now, this training kit is available in French only.

This kit is intended for researchers, students, doctoral students, and research support staff who wish to acquire basic skills in graphical representation with ggplot2.

It is recommended that users have a basic understanding of the R language and its RStudio interface, including the manipulation of objects, variables, and data using tidyverse packages, as well as the calculation of simple statistics.  These skills can be acquired by following the educational kit Introduction to survey data analysis with the R language - ERFI1.

The kit lasts 6 hours and can be used independently for self-training or as a support tool for guided training.

The training includes a theoretical introduction to the fundamental principles of data visualisation, a presentation of the key concepts of ggplot2, and a practical application consisting of reproducing certain graphs from a scientific article, using an anonymised dataset from the ERFI-1 survey provided by the INED Survey Department.

This kit was developed by the LifeObs Training Department.

Learning objectives

  • Learn the fundamentals of graphical representation in order to create clear and informative visualisations, taking into account the type of variables to be represented (quantitative or qualitative), best practices in graphical semiotics (colours, shapes, sizes, layouts) and incorporating essential elements such as legends, sources, titles, and the choice of axes and scales.
  • Learn how to use the R ggplot2 package, an extension of tidyverse and a powerful tool for designing reproducible and aesthetic graphics using consistent and unified syntax.
  • Illustrate teaching and put the skills acquired into practice using microdata from a real survey, the ‘Study of Family and Intergenerational Relationships’ (ERFI-1) survey, conducted in 2005 by INED (National Institute for Demographic Studies) and INSEE (National Institute of Statistics and Economic Studies). The proposed exercise consists of reproducing certain graphs from a scientific article by Arnaud Régnier-Loilier, published in 2006 in the journal Population et Sociétés: ‘How often do we see our parents?’. This exercise also aims to encourage the use of data from the new ERFI-2 survey cycle, which has now been completed as part of the LifeObs project coordinated by INED. The forthcoming availability of ERFI-2 data via the Quetelet Progedo-Diffusion application opens up many opportunities for analysing recent family behaviours and how they have changed since the first ERFI-1 survey cycle.

Resources mobilised

Several resources are being used:

  • a simplified and anonymised dataset for training purposes from the ERFI-1 survey prepared by the Survey Department of INED - SES (National Institute for Demographic Studies). It contains a selection of original responses from the ERFI-1 survey, some of which have been recoded/modified for anonymisation purposes.
  • a training resource (designed in RStudio with Quarto) combining theoretical presentations (key principles of graphical representation, presentation of the ggplot2 package) and a practical case study detailing the various operations to be carried out (R instructions and results of their execution) enabling the reproduction of the first graphs from an article written by Arnaud Régnier-Loilier in 2006 in the journal Population et Sociétés: ‘How often do we see our parents?’
  • Other documents useful for training: documentation on the anonymised dataset and the original ERFI-1 survey data, article from Population et Sociétés, some of whose results are replicated, dictionary of variables from the anonymised ERFI-1 file, etc.

Find out more about the ERFI survey

The Study of Family and Intergenerational Relations (Erfi) is the French version of the Generations and Gender Programme (GGP) of international longitudinal surveys launched by the UN in the early 2000s.

Targeting people aged between 18 and 79, the general aim of ERFI is to describe the dynamics of family construction (fertility, unions, break-ups, family recomposition) and to explain the mechanisms involved, in particular by studying the role played by relations between men and women and intergenerational relations. Data is collected in over twenty countries (mainly in Europe), using a standardised questionnaire.

In France, INSEE and INED carried out the first round of surveys (Erfi-1) in three waves (2005, 2008, 2011). A second round of surveys (Erfi-2), based on a very similar methodology, will begin in France in 2023.

Survey website: https://erfi.site.ined.fr/

LifeObs training kit - ggplot2 -ERFI1

Training material : lien

Publication : lien