Book review: HOW TO TALK ABOUT DATA
Build your data fluency
Genres:
- Information Management
- Data Science
Review posted on:
02.04.2024
The number of pages:
272 pages
Book rating:
4/5
Year the book was published:
First edition published 2022
Who should read this book:
- Data analytics, data scientists, and managers wanting to understand and communicate better about data.
Why did I pick up this book and what did I expect to get out of it:
I found this book in my library while causally checking out the data science section. The title got my attention and after reading the back cover and table of content I decided to pick it up. From the covers and the table of content I expect to read not so much about the data itself and all the technicalities that go with data but more on how to think about data, what questions you should have as a data analyst, and also as someone who is presented with insights from the analysis of certain data points. There are some chapters with storytelling and visualizations so I wonder how the authors intend to incorporate storytelling in presenting insights from the analyzed data.
My thoughts about the book:
Let’s start with the question of whether “How to Talk about Data” delivered what I expected. The short answer is mostly yes. I was pleasantly surprised by how the authors methodically take you on the journey of a data analyst and cover all aspects of data analysis and presenting insights. I like how the authors structured each chapter. The chapter starts out with a section “What you’ll learn”, then you get to read a “Data conversation” where you read a real-life scenario of the problem that can arise about the subject in the chapter. After this, you read about the main content of the chapter, which is followed up by another “Data conversation” where you can read how the previous problem gets solved. After this, you get to read key takeaways from the chapter as well as what are some of the traps.
The authors also did a very good job of implementing the storytelling aspect into presenting findings and insights. Many books that I have read about data analysis and presenting data talk about storytelling but provide mediocre at best examples, while Martin and Fabienne did a great job with their storytelling canvas. I also liked that they touched on the subject of delivering bad news, how to do that, and how to handle disagreements. If there has been no real in-depth reporting until now in a certain company, there will be a lot of disagreements with stakeholders that did their own reporting until now. I also liked that the authors guide you on what to do if you are at the other end of the analysis, the receiver. In that chapter, they provide you with the questions you should ask your data analyst on the data and the analysis methods that were used which provided those insights. Like I mentioned before the authors paint a full picture of what you can expect to encounter as a data analyst or as someone who is receiving insights from new analysis.
So if you find yourself in the role of a data analyst or someone who is getting reports and you want to be better at understanding and communicating about the whole process and its findings you should definitely pick up “How to Talk about Data”. If not anything else those questions you should ask your data analyst after the presentation are worth it as they get everyone in the meeting more involved. And just to rant a little bit, the part on statistics was not my favorite, but I understand it was a must to get the whole picture. All in all, Martin J. Eppler and Fabienne Bünzli have done a great job writing this book.
If you picked up this book please let me know what you think about it in the comment section.
A short summary of the book:
In the first chapter, the authors talk about why many shy away from data and statistical analysis. They share the seven drivers that lead people to be afraid of data. In the second chapter, you get to read about the basic concepts you need to know to talk about data and you will also get an overview of the trends at this time. In the third chapter, you will learn about patterns and what they mean. You will also learn about predictive analysis and how to infer generalizations from your data. Following all of this in the fourth chapter you will learn about relationships between your data and how to draw the right conclusion. In the fifth chapter, you will learn about machine learning and step by step how to segment, or group elements such as customers, or products. One such method you can use is called cluster analysis. The sixth chapter is about data distortions where the authors talk about biases in your data collection, analysis, or communication. You can find more on this in my notes below.
In the seventh chapter, you learn about asking the right questions about data that you need to ask whenever data is presented to you. In the eighth chapter, you will learn how to visualize data in a concise and fitting format. The authors provide the DESIGN guidelines for quality charts with examples and pointers to further resources. You can read a bit more about them below in the note section. Chapter Nine was a nice read where you will learn how to effectively implement storytelling into your presentation. Based on this chapter you will know how to connect your data to your audience and how to sequence the presentation to keep your audience engaged. In chapter ten the authors talk about working with analytics software instead of using slides as real-life drilling into data is more engaging and keeps your audience on their toes. In chapters eleven and twelve the authors talk about delivering bad news and handling disagreements about data and the findings you are presenting. These two chapters are very important specially in companies where the main recipients of the reports were doing the reports themselves until now and new findings might actually shock them. Chapter thirteen looks into the future and the authors shed light on important trends in data analytics.
My notes from the book:
- There cannot be high-quality data analytics without high-quality conversations. It is through conversations that query results and data reports are shared, scrutinized, put into a wider context, and ultimately made applicable.
- An easy and quick way to get an overview of your data and a feel for trends is to construct frequency distributions. After having an overview of how your values are distributed, you may want to find out where the center of the distribution lies. The three most commonly used measures are The Mode, The Median, and The Mean.
- In the scatter plot, you should use the horizontal (x) axis for the variable that is considered the predictor, and the vertical (y) axis for the variable that is considered the outcome.
- Misconceptions about statistical concepts can severely impact the discussion quality and inhibit getting into a state of flow and creativity. To avoid misunderstandings, spell out what things in your analysis really mean.
-
As a guideline, a good presentation or report should answer the following key questions:
- What was done and what can we learn from it?
- How was it done?
- Why was it done?
- Who did it?
- When was it done? - Correlation only means that there is a mathematical relationship between two variables, but it does not mean that one causes the other, let alone that the relationship makes sense.
- Regression analysis is just the fancy term to say: "We use one thing to predict another thing using a linear equation." This happens through minimizing the error of the model. The error is the difference between the data you collected and the prediction by your model (regression line). The bigger the difference the worse the model is and the other way around.
- To conduct a regression analysis, we have to define a predictor variable and an outcome variable. The P-value answers the question "Is there a real difference", whereas the effect sizes tell you "How big is this difference".
-
Data Gathering Biases:
- Selection bias: Using only conveniently available data instead of representative data.
- Survivor bias: Focusing only on the data that came through and ignoring what has not.
- Confirmation bias: Only looking at data that confirms your opinion.
Data Analysis Biases:
- Confounding variables: Not taking variables into account that affect the association between two things. - Neglecting outliers: Not acknowledging outliers. - Normality bias: Not taking the actual distribution of the sample into account. - Overfitting: Playing with models so that they fit the data we have but not beyond.
Data Communication and Usage Biases:
- Curse of knowledge: Analysts fail to adequately communicate their analyses to managers because of the complexity of their procedures. - Dunning-Kruger effect: Managers overestimate their grasp of statistics and are unaware of their wrong data interpretation or use. - Causation bias: Believing that one factor causes another, simply because the two are correlated. - To overcome the "Curse of Knowledge" try to find out how much your audience actually already knows about analytics, statistics, or the data you are sharing with them.
- After a presentation, you should ask the presenter questions related to data sources and data quality, about the process of the data analysis, what were the key findings, which results could be easily misinterpreted or misused, who else should hear about this insights, is there any other way we could exploit this data and get value out of it, and if we could start this data analysis over, what would they do differently now?
-
The six design principles for great data charts:
- Declutter: get rid of borders, grid lines, unnecessary details, 3d effects, colors, shades and other decorations.
- Emphasise: If you want to enable comparisons then choose vertical bars. If you want to enable a ranking from smallest to largest then choose horizontal bars. If you want to emphasize deviations from a goal then choose upward and downward vertical bars.
- Storify: Storytelling is all about sequencing a set of charts or nimating and enriching a single chart. This requires you to split your charts into a trilogy by first setting the scene with an overview chart that clarifies the situation. Then you show the complications with one or several charts to show more details about the problem. And finally, you provide a resolution with a chart that shows opportunities for action.
- Involve: In an interactive chart, you can involve your data users by letting them select areas of interest, zoom in to more details, explore different data aspects, filter out elements,... In a live presentation, you can involve the audience by letting them guess a result.
- Give meaning: You can give meaning by linking data directly to possible actions, providing self-explanatory labels, and explaining the reason behind outliers or other strange patterns. You should also provide reference points that show whether a value is good or bad.
- No distortions: Avoid pie/doughnut/arch charts or charts that mix units and have two different y-axes in a single image. - When presenting data in the form of storytelling first tell your audience why they should care about the data. Connect with your audience by relating the data to something that they already know (like a recent event) so that you can build on common ground when introducing new elements.
- Remember to make use of emotional triggers such as for example surprise elements of counterintuitive data, or estimation questions before revealing the answer from the data. Also think about how you can use your data to shock, to cause worry or pride, to intrigue or amuse. Avoid jargon, explain technical terms, and provide accessible examples.