Visualizing Innovation

Introduction

Dr. Matthias Schnetzer

October 9, 2023

Course info


About this course

“A picture is worth a thousand words”

This course focuses on data visualization techniques with reference to contemporary issues of economic policy. Students will acquire data visualization skills in in-class coding sessions and home assignments.

You will gain:

  • an overview of contemporary debates in economic policy
  • a basic understanding of principles of data visualization
  • knowledge how to enrich academic publications with informative graphs

Cooperation with Austrian Patent Office

As a special feature, the Austrian Patent Office will exclusively provide internal data to create data visualizations and illustrate the process of private innovation in Austria.

Two units will be organized by the experts from the Patent Office to talk about innovation and patents, and introduce the data and topics for the final assignment.

Who are you?

What do you expect of this course? Do you already have some coding experience in R?

Schedule

Date Time Room Content Assignment
Oct 09, 2023 10:00-12:00 TC.3.10 Data
Oct 16, 2023 10:00-12:00 TC.3.10 Visualization
Oct 23, 2023 10:00-12:00 TC.3.10 Geometries
Oct 30, 2023 10:00-12:00 Online Colors
Nov 06, 2023 10:00-12:00 TC.3.10 Labels
Nov 13, 2023 10:00-12:00 TC.3.10 Patents
Nov 20, 2023 10:00-12:00 TC.3.10 Patent Data
Nov 27, 2023 10:00-12:00 TC.3.10 Themes
Dec 04, 2023 10:00-12:00 TC.3.08 Maps
Dec 11, 2023 10:00-12:00 TC.3.06 Presentations (1-4, 13, 15-20)
Dec 18, 2023 10:00-12:00 TC.3.06 Presentations (5-12, 14)

Assignments

Assignment 1 provides the setup of the R infrastructure that is required in this course. There are no points for this assignment.

Assignments 2 to 4 are recreations of examplary figures. These examples are related to figures that are discussed in class. Students should then try to reproduce the plots at home and improve their individual coding skills.

The raw data for the figures are available as CSV files. The charts should then be uploaded to the learning platform before 9 a.m. on the day of the deadline.

Visualization & Report

Chart presentation

  • Research question [What do you aim to show with the visualization?]
  • Data
  • Chart [What is the main takeaway?]

Deadline: Date of the presentation

RMarkdown report

  • Title
  • Author
  • Introduction
  • Research question
  • Data
  • Result
  • Conclusion
  • Code

Deadline: January 31, 2024

Grading


Assignments: 30% (0-10 points for each visualization)

Chart presentation: 30% (0-20 points for the quality of the presentation, 0-10 for the preliminary chart)

Written report: 40% (0-40 points for the report and the final chart)

Feedback, cooperation and help

Let me know your feedback on the course anytime. If possible, I will try to incorporate your feedback immediately. At least, I will consider it for future courses. At the end of semester, we will have a student evaluation.

As some of you might already have advanced coding skills in R, please support each other and collaborate. This does not mean that one person does all the coding and shares with all colleagues. Students should have an intrinsic motivation to improve their coding skills but cooperate to learn from each other.

There is a forum on the learning platform for exchange among students. Please also consult support platforms like Stack Overflow or take a look at the cheatsheets:

Data

The era of evidence in economics

The figure shows the evolution of economics literature by text mining the 500 most-cited titles in top journals by decade. There is a shift from advancing theory towards empirical evidence.

The rise of empirical articles

Evidence-based economic policy

  • Data collection: Collection of relevant and high-quality data (administrative data, surveys, interviews, observations, etc.). Researchers should be aware of the qualities but also of the flaws of the data.
  • Data analysis: The design and type of analysis depends on the question being asked and resources available. The methods range from qualitative to quantitative analysis. The choice of application might unwittingly involve normative reflections by the researcher.
  • Policy suggestions: A major goal of empirical economics is to serve and improve policy making. The findings should, however, be carefully interpreted with regard to the limitations of empirical analyis. Economic policy, even if it’s evidence-based, is affected by norms, beliefs, etc.

Mind the (data) gap!

The limits of data

  • Data is never a perfect reflection of the world!
  • It’s only a subset: not crime but reported crime
  • Information is collected by humans and processed by machines: imprecisions and errors are inevitable!
  • Be aware of potential (cognitive and statistical) biases!

Invisible women

Invisible rich

Discuss with your neighbour

What other potential flaws and challenges of data collection come to your mind?

How could these flaws be tackled by the researcher?

Be aware of differences between data sources!

Income data in EU-SILC

Individual level:

  • employee cash or near cash income
  • cash benefits or losses from self-employment
  • pension from individual private plans
  • unemployment benefits
  • old-age benefits
  • survivor benefits
  • sickness benefits
  • disability benefits
  • education-related allowances

Household level:

  • income from rental of a property or land
  • family/children related allowances
  • social exclusion not elsewhere classified
  • housing allowances
  • regular inter-household cash transfers received
  • alimonies received
  • interest, dividends, profit from capital investments in incorporated business
  • income received by people aged under 16

Administrative versus survey data

Impact on response behavior:

  • Social desirability
  • Sociodemographic characteristics
  • Survey design
  • Learning effect


Mean reverting errors

How do we explain the mismatch?

Why should we plot data?

Anscombe’s quartet

I
II
III
IV
Obs. X Y X Y X Y X Y
1 10.00 8.04 10.00 9.14 10.00 7.46 8.00 6.58
2 8.00 6.95 8.00 8.14 8.00 6.77 8.00 5.76
3 13.00 7.58 13.00 8.74 13.00 12.74 8.00 7.71
4 9.00 8.81 9.00 8.77 9.00 7.11 8.00 8.84
5 11.00 8.33 11.00 9.26 11.00 7.81 8.00 8.47
6 14.00 9.96 14.00 8.10 14.00 8.84 8.00 7.04
7 6.00 7.24 6.00 6.13 6.00 6.08 8.00 5.25
8 4.00 4.26 4.00 3.10 4.00 5.39 19.00 12.50
9 12.00 10.84 12.00 9.13 12.00 8.15 8.00 5.56
10 7.00 4.82 7.00 7.26 7.00 6.42 8.00 7.91
11 5.00 5.68 5.00 4.74 5.00 5.73 8.00 6.89
Mean 9.00 7.50 9.00 7.50 9.00 7.50 9.00 7.50
SD 3.32 2.03 3.32 2.03 3.32 2.03 3.32 2.03
Corr 0.82 0.82 0.82 0.82

What do we learn when plotting the data?

Do you see correlation?

Correlation: -0.07

Correlation: -0.07

Same same but different

Let’s start coding with the penguins

The Palmer Penguins

The data was collected from 2007-2009 by Dr. Kristen Gorman with the Palmer Station Long Term Ecological Research Program. The dataset contains data for 344 penguins. There are 3 different species of penguins, collected from 3 islands in the Palmer Archipelago, Antarctica.

Bibliography

Angel, Stefan/Disslbacher, Franziska/Humer, Stefan/Schnetzer, Matthias (2019). What did you really earn last year?: Explaining measurement error in survey income data. Journal of the Royal Statistical Society: Series A (Statistics in Society). DOI: 10.1111/rssa.12463
Angrist, Joshua/Azoulay, Pierre/Ellison, Glenn/Hill, Ryan/Lu, Susan Feng (2017). Economic research evolves: Fields and styles. American Economic Review, 107(5), 293–297. DOI: 10.1257/aer.p20171117
Brice, Brandon D./Montesinos-Yufa, Hugo M. (2019). The era of empirical evidence. Mimeo.
Disslbacher, Franziska/Ertl, Michael/List, Emanuel/Mokre, Patrick/Schnetzer, Matthias (2020). On top of the top - adjusting wealth distributions using national rich lists (Working Paper Series No. 20). INEQ.
Gorman, Kristen B./Williams, Tony D./Fraser, William R. (2014). Ecological sexual dimorphism and environmental variability within a community of antarctic penguins (genus pygoscelis). PLoS ONE, 9(3), e90081. DOI: 10.1371/journal.pone.0090081