We all know how important data is becoming to modern businesses. The amount of data floating around in modern companies is such that a wide range of insights can be gained just by analyzing the data present within your company. However, what you could more easily overlook is the value of external, public data to your business.
We know that, generally speaking, more data improves a data analysis project. You can obtain more insights with more data, or you can make an existing analysis more robust. Even in the worst case scenario, your analysis won't get worse when you add more data. Despite that, there can be skepticism around whether this applies to public data sources. Maybe you think it's too difficult to combine external data with your own company's data. Maybe there's an innate skepticism towards this modern trend of giving away valuable things for free.
In this post, we'll discuss common sources of public data and aim to show you how it can add value to your analytics.
In recent years governments, national and international statistical bodies and various other public bodies around the world have made great efforts in developing an open data culture. Open data simply refers to any data that is made available to anyone to share or use. Many public organisations now make their data accessible to the general public so that you can download and analyse them.
Public data is collected in various ways, and by various different bodies. Often, data will be collected by directly seeking it, such as through surveys or interviews. This can often be the case when the data of interest is some characteristic of the individuals involved, such as education or income levels. Alternatively, data can be collected by observing processes in the real world. This would be more common where the data of interest is an action that people take, such as commuting patterns.
From a business point of view, public data has positive and negative attributes. On the plus side, public data is generally freely available for you to use. It can serve as a useful supplement to your own data, perhaps providing information that you could not collect yourself. On the minus side, you cannot control how the data is collected, so it might not exactly match what you need. As the data is public, it is also accessible to all your competitors, so you may not gain the sort of edge in the market that you could gain from analysing your own data.
One of the largest sources of public data in any country is the census. In case you don't know, a census is simply a count of every single person in a specific country at a particular point in time. Most countries conduct a census once every 5 or 10 years. Census data is generally used to plan and co-ordinate the provision of public services, to create political constituencies, and for various other purposes.
A census provides a pretty large amount of information on all the people in a country. Generally it provides simple demographic information, such as names, ages and addresses, but it can also provide other information, such as ethnicity or information on education levels. Often a census body will supplement the census with more frequent surveys and research that can provide additional information, such as income levels.
The primary advantage of census data, especially compared to other surveys, is that it is comprehensive. Other surveys only include a sample of the population of interest. Even with the best efforts of the people conducting such a survey, it is possible that these samples will not reflect the population of interest. If this is the case, then using this data could lead you to incorrect conclusions.
By contrast, a census aims to record data on every single person in a country, so you can be more confident that the data is accurate for your needs. Of course, in reality a census isn't perfect. Certain people or groups of people are often under-counted for various reasons. However, this is often known to the census body and should be less of an issue than the issues associated with surveys.
US Census Data
One of the most important public data sources is that provided by the US census bureau. Their website can be found here. The US census bureau provides many different data sets for analysis by the public. In addition to the census itself, which is conducted every 10 years, they also produce the American Community Survey, which is an ongoing survey that is updated annually and provides a lot of information about the American populace.
The website allows you to download data on a vast range of topics, including economic indicators, employment statistics, educational attainment, and income statistics. Most data sets can be downloaded in an Excel or csv format, so you can be confident they will integrate with whatever tools you are using for data analysis. That said, some files can require a bit of work before they are ready to be analysed.
In this post, we have learned about some of the major sources of public data that you could use in your business. To close, consider this. If you were analysing the geographical distribution of sales for your company, and you had access to some internal database of customer data, you would almost certainly incorporate that data into your analysis. You might want to know if your customers are concentrated in particular areas, as this would affect the distribution of your sales significantly.
If you don't have such a customer database, then public data could provide one for you, enabling you to get the same level of insights without having to collect a large amount of data yourself. In this way, open and public data sources make more data accessible to a wider group of people and companies. Even if you have your own data that does a similar job, you can compare your data to the public data to validate your own data set. In any event, you should be able to derive value for your business from public data.