By now, almost every business around has figured out that data and analytics represent a big opportunity to generate compelling insights into the company, and thereby improve outcomes. However, some businesses put themselves at risk when they fail to ask where that data should come from.
Companies are probably aware of the value of foraging through their company to identify any data they can find that might be of interest, maybe even in the process uncovering that lost database on a floppy disk that fell down the back of the sofa in 1997. However, it’s easy to forget that you’re not limited by what data can be found within the four walls of your company. Many external organisations, especially governments and public bodies, make their vast quantities of data available for anyone to download and use. If your business isn’t taking advantage of this, then you’re probably not maximizing your analytics capabilities.
Public Data and its Value
So where do you get public data and why is it so valuable to businesses? Well, we have a post here that talks about public data, and in particular census data as a source of insights for your business. Here, let’s consider an example.
Let’s say your business sells its products nationally or internationally to individual consumers. You may collect some amount of data on your customers as part of the sales process, especially if your product is sold online. However you almost certainly don’t know everything about them.
For example, you might like to know what the income of your customers is. This would be useful to know what segments of the market your product is most popular with. The problem is that the number of products where it’s legitimate to ask customers for their income is minuscule.
Even if you don’t know your customers income, someone else who makes data available publicly probably does. The most likely example here would be a census organisation or a government statistics body. By integrating their data with yours, you can uncover insights into your customer’s income. To be clear, you won’t be able to identify the income of individual customers this way, but you can still derive insights indirectly. For example, you probably know your customers’ addresses. If a lot of your customers live in areas that the census tells you tend to have large incomes, then you may be able to assume that your customers have a similar profile. Obviously, there’s a little bit more to it than that, but you can see the principle that applies.
Let’s consider another example. Unless you’re Facebook or someone similar, the chances are you only collect information on people who actually are your customers. If someone doesn’t buy your product, you may know they exist, but you don’t really know who they are, or what characteristics they have that set them apart from the people who are your customers.
Without wanting to get into a statistics lesson, what you have is a data on a sample of people. The chances are this sample is not representative of the overall population of people who could be your customers. Public data like a census does contain information on everyone, so incorporating it into your analysis lets you compare characteristics of people who do or don’t buy your product. This makes it much easier to know what characteristics make people buy your product than it would be if you were only using you own data.
Where to get public data from
There are numerous sources of public data available. Most of the organisations who make their data available are governments or government bodies or agencies, however there are also a number of private companies who make their data available as well, especially in the technology sector. We’ve compiled a long list of organisations that make their data available here.
Getting public data
Once you’ve identified what data you want to obtain, and where you can get it from, your next step is to get the data, and probably after that to transform it into a usable format.
This can often be the most frustrating part of the process. You may know of the idea that a data analyst spends most of their time transforming data into the right format, and much less time actually analysing the data. This is the stage of the process where that idea gets put to the test, as you’ll often find files that are badly formatted, contain information you don’t need, or are just saved in outdated file formats.
One example I found interesting is within the files produced by the US census bureau. Most of their files are designed to be opened in Microsoft Excel, but some of them use the old xls file format, which was replaced as the Excel standard in 2007, and consequently causes problems for some applications that are newer than that. In this case the data was from this decade, it wasn’t an old dataset being saved in an old format.
The chances are that you’ll encounter similar frustrations as you go along. It’s frustrating but it’s all part of the job. All that said, it’s easier than ever these days to clean up strange datasets using tools like the Query Editor in Power BI.
Using public data with your company’s data
After you’ve successfully got the external data into your software of choice and into the format you want it, it’s simply a matter of analyzing the two datasets together.
We have a company that sells its product across the USA and studies the distribution of sales by state. However, they want to know whether the states where they have the most sales are states where the product is actually popular, or are they states that just have a large population.
In the post, we add population data from the US Census Bureau to the company’s sales data to find out which states are performing best for the company after population is taken into account. As we find out, many of the states that have the most sales are actually not the most popular when we allow for populations. Analysing data from multiple sources lets us identify insights that we could not have obtained looking at the company’s data alone.
Open data is a growing trend in data analytics. While you might be tempted to think that it doesn’t fit in with the corporate world, that’s not the case at all. While it can take a bit of effort to find the data you need, and to get it into a suitable format foe analysis, the effort allows you to obtain greater insights into your company than you could do using only your own data. For that reason, open data should be an area you pay attention to now and in the future.