This site is in beta. Tell us what you think.
Chapter 1 | Creating Value with Data Guidebook

Introduction to Creating Value with Data

"Monetize Data!" we're told. But what does that mean?

There are many different ways to monetize data, either directly or indirectly, and there are many more opportunities beyond simply selling data. Instead of 'data monetization', let's think of it as 'creating value with data.'

The Economist and countless others have used oil as a metaphor for data. It's not perfect, but it does give us a good way to think about how data works.

The crude version is mined (or extracted); processed into a refined version (aggregated and cleaned); it's the basis for commodified products like plastics, and advanced products use these commodified products, like smartphones.

However, like oil, just because we can extract data doesn't mean we should.

Like oil, sometimes data use makes things worse rather than better.

And, like oil, there is a big gap between the crude version and something useful to the average human.

Similarly, oil producers often look for problems to solve with oil (the proverbial hammer in search of a nail) which can over-emphasize certain kinds of problem-solving. Data-dealers or data scientists often look to solve things with data and don't always work their way back to user needs from problems data can solve.

“Data is the new oil”
— The Economist and others have used this analogy. Just because you can mine it doesn't mean you should.

There are many ways to create value with data, many of which are combined in practice.

Datasets: directly providing data from one or more sources and with varying degrees of structure and refinement
Insights: making insights from an analyzed dataset available
Algorithms: making the machine learning models you have built and trained available to others
Optimization: running and optimizing your organization's work using insights from data
Products: using some combination of one or more strategies above to make a data-centric product or service
Investments: expending time or other resources into financial plans, property, equity, or other venture based on the data and analytical models you have access to
Personalization: customizing offerings or other experiences with individuals' or firms' data
Relationships: creating and deepening relationships, either between yourself and others, or between third parties

There are many ways to create value with data, many of which are combined in practice.

Datasets: providing collections of raw or lightly-processed data points
Insights: deriving and providing new information from data
Algorithms: creating and training steps for machines to use in analyzing other data
Optimization: improving an organization's operation or a user's experience
Products & Offerings: Improve an existing offering, such as a product or service, or create a new offering
Investments: making decisions about what to apply time, money and focus towards
Personalization: customizing a user's experience with their own data, third-party data and/or recommendations
Relationships: creating or improving human relationships through data


There are several kinds of data.

First, there is fundamental data (something directly and intentionally measured, like a list of stock investments or an image file on a camera). Second, there are facets or features of the fundamental data, often known as metadata (data about data). This might record, for example, the purchase price of one of the stocks or the date the camera took the photo.

Some secondary datasets may come from 'data exhaust'—something a digital system records or logs without a specific intent in mind.

Monetized datasets can provide either or both types of data. For example, a video stream of a security camera could be considered fundamental data, and logs of web traffic might be secondary data or metadata. What makes them data exhaust is that they are automatically collected, often for unknown future purposes.

Whether fundamental or secondary, these datasets may be monetized as raw, unprocessed data, without structure or labeling, or it might be aggregated and/or processed into a larger, more 'packaged' dataset.

For more information, read the first three modules of the Data Supply Chain (Acquire, Store, and Aggregate).

Sometimes datasets are distinguished by the terms 'traditional data' and 'alternative data' - a division often discussed in financial services. There, and in parallel industries, traditional data means measurements that directly describe fundamental things about an asset or other item of interest. These tend to be absolute and past-based.

Sometimes an alternative dataset can be used to infer something about a ‘traditional’ dataset. Parallel or secondary data can build on traditional data to infer additional detail or provide predictions.

For example, retail stores traditionally report their sales results in the quarter after the holidays, leaving investors to wonder about performance for a time. However, many retail stores have parking lots that can be monitored by security cameras or by satellite. Parking lot occupancy can be cross-referenced to nearby stores to guess what earnings will be.

If we ask the basic question “What will a retail store’s holiday sales numbers be?” we might have to wait until the next quarter for the report. To get some indication of these results sooner, we can use computational thinking: decomposing the result (sales) into the steps leading up to it (number of potential customers in stores, which can be inferred by how full parking lots are).  

By pushing ourselves to see the root cause of or correlations to those sales, like an increased number of shoppers, we can get some indication of the outcome sooner.

Such a mindset lets us get an idea of what sales volume we might expect before we get the quarterly report that happens well after the holidays.

Traditional Data

Directly describes an asset’s market position or fundamentals

Broadly accessible, obvious, usually from within financial markets

Tends to be ‘now’ or ‘after the fact’

Tends to be free or low-cost

Often has a long, consistent history

Alternative Data

Can be used to infer fundamentals or something a/effecting fundamentals

Is ‘discovered’ or ‘mapped,’ sometimes not obvious—usually from outside financial markets

May be used to predict the future

Tends to be expensive

May be shorter or less consistent

Alternative Data ethics risks

Alternative Data offers many opportunities but is also full of ethical considerations.

Using data sets for purposes other than those the disclosers gave consent for won't just damage a brand's reputation but may be illegal.

For example, selling datasets such as satellite imagery could result in de-anonymizing people or otherwise disclosing sensitive information if not handled correctly. It may be harmless to sell a few cropped satellite images at low resolution, but if those were shared with malevolent parties via resale or a data breach, they could be re-constituted and end up disclosing personally identifiable information, national security vulnerabilities, or even just basic security risks like showing the location of service doors on the roof of a mall.

You can read more about ethical considerations for alternative data in the Data Ethics guidebook.

There are wide varieties of datasets which are directly monetized. For example:

  • Amazon Web Services Data Marketplace
  • Clearbit (contact data)
  • Salesforce’s
  • Bloomberg (finance data)
  • Crunchbase (company info)
  • Telecoms (locations and searches)



Crunchbase collects data on companies, such as their structure, basic financials, leaders, investors, and key employees. By aggregating, normalizing, and verifying the information, Crunchbase makes it possible for other companies to build products that need reliable company information, such as an investing app that provides additional context to its users about the companies listed on a stock exchange.


AWS Marketplace

Data is often accessible via third-party marketplaces. The Amazon Web Services Marketplace offers many elements useful for data-centric innovation, including both broad and highly focused datasets. For example, users can source data on:

  • COVID-19 cases
  • Real estate
  • Satellite imagery
  • Healthcare claims
  • Traffic

And many other topics. Similarly, organizations can list their own data. Amazon's role is to provide a marketplace for many parties to lease data to each other.

A screenshot from the AWS marketplace with different services offered.



Sometimes, companies both consume and produce data.

Nexar, an advanced dashcam, probably had to access maps and other datasets to build its product.

In turn, it sells the images captured by users' cameras. In this case, Nexar markets street sign data that might otherwise be too complex or expensive to gather.

Suppose another company was attempting to make maps of particularly pedestrian-friendly areas. In that case, they might use Nexar's data to determine which roads had good 'pedestrian awareness' signs for drivers. At the same time, another hypothetical firm could find the most truck-friendly routes by avoiding streets with poor traffic controls.



Clearbit acquires, aggregates, and normalizes data about individuals, especially professionals. It does this by offering a free service to individuals in a two-way model: In exchange for sharing their address books with the company, members' address books are updated with the most up-to-date contact, title, and employment history.

Clearbit then monetizes that data with a premium, one-way version available to larger companies who get enriched data without having to provide data in turn.

A screenshot of a form by clearbit with a pop-up screen with data and code.



Salesforce's comprehensive suite of products all leverage data. However, Salesforce also operates a service they call, which allows companies to buy prospect lists (contacts at other companies for sales).

A vast amount of data is accessible via public or private APIs across almost every industry. It can be hard to choose which data sets to access, but it helps to think of data as domains and records: What domain (or industry) is that data in? And what kind of records might it have? For example, Bloomberg provides data in the 'Finance' domain and offers record types like news stories and stock statistics.

To get started, try using the Data Sources Explorer to explore generic data source types.

When it's time to evaluate actual data types to monetize (or to acquire), you can reference Attributes of Data.


Rather than selling complete data sets, some firms create value by providing insights based on data such as analyses of market activity, research or trends.

'Insights' companies include market research firms like Gartner and Forrester, as well as specific insight products, such as investment bank Credit Suisse's insight into the value of pharmaceuticals for industry benchmarking.

Market research firm Community Marketing Insights specializes in research about LGBTQ people consuming mass-market products.

These firms are not differentiated simply through the data they acquire and/or aggregate but through their unique analysis of it.

Their value propositions may include the underlying data or only the outcomes of their analysis.

Some of the insights for sale are basic mathematical analyses of statistics, while other insights firms construct qualitative frameworks (like Gartner's Hype Cycle), taxonomies, timelines, cause-and-effect studies, and/or integrate editorial perspectives.


Algorithms (mathematical models used for analyzing data) are another way to create value. A lot of energy goes into developing strong algorithms, and the differences in quality are enormous. While some organizations may create data science and machine learning teams, many firms just need access to a good algorithm to analyze their own datasets.

Monetization of algorithms generally falls into two categories: direct sale or licensing of an algorithm; or application program interface (API)-based access to it. The direct approach usually accelerates existing work on machine learning or operates inside a highly-secured environment, while the latter provides a commodity service.

Perhaps one of the best-known examples of an API-monetized algorithm is Google Image Recognition. Users of the service do not have direct access to Google's algorithm and need nearly no technical infrastructure to use it. They simply submit an image via an API call, and Google returns a weighted list of likely keywords. The service can recognize text, everyday objects, and brand logos. Causeit itself does this: we pass images of business cards we've received through Google to quickly identify companies and other relevant information for our internal customer relationship management tool.

Algorithms can be sold as a completely standalone service, like Google Image Recognition, or a core part of a more significant value proposition.

WeGlot, for example, helps website creators easily overlay their primary website with translated (or 'localized') editions and edit the various editions quickly. For example, an English-language website may have a Spanish edition. That functionality is useful on its own, but the real value of WeGlot is that they provide machine translation of your website using an aggregate of three different translation algorithms and a way to blend that machine translation with professional human translation where needed. WeGlot did not need to build an algorithm for translation—this would have been expensive and unnecessary—but instead was able to monetize other algorithms uniquely.


AWS Marketplace

Many marketplaces offer algorithms as a service. This means that necessary parts of digital value propositions (or just data cleanup) can be browsed and implemented quickly.

Many algorithms are focused on cleaning up and/or labeling raw data, which are time-consuming but essential tasks for most data projects. Others are for functions like processing speech into text or recognizing images. And some are highly specialized, like models for forecasting hospital capacity.

Amazon Web Services' marketplace offers algorithms as a service and is one of the most visible venues for purchasing or licensing algorithms. Other firms like Algorithmia/DataRobot provide similar marketplaces, while GenesisAI is creating a marketplace for connecting various AIs.

To find out more, search for "algorithm API marketplace" plus your preferred domain.

A screenshot of the AWS Algorithm marketplace with different algorithm offerings.


Data can create value by helping you optimize your existing operations. Often, decisions inside organizations are made based on intuition or a limited view of what customers and employees needed in the past. Data can be used to help make decisions inside a business and dynamically optimize stages of production.

Data is often used to optimize:

  • Web traffic analytics software like Google Analytics, which helps writers and designers optimize their content for search engines and ease of use
  • Customer Relationship Management (CRM) software like Salesforce, which helps companies find and connect data points about customers, and forecast user needs and sales
  • Communications systems like Slack and Microsoft Teams, which can be analyzed to see hotspots and trends in communication
  • Support systems like ZenDesk, which auto-suggest helpful articles
  • Financial systems like Expensify, which uses machine learning for expense categorization and spending trends analysis

In these situations, simple data-sharing can be powerful in its own right, such as sharing customer contact information across a company. More advanced implementations of machine learning and recommendation engines can help users operate more efficiently, freeing them to do more valuable parts of their jobs or entirely new ones.



Salesforce has a suite of offerings designed to augment human work through a customer's journey with a company. They have monetized data and related technologies in many ways, among them:

  • Organizing companies' customer data in CRM software
  • Providing access to unique data processing capabilities through a developer ecosystem and app-building tools
  • Enabling companies to tailor their messages using customers' data using specialized marketing tools
  • Sales optimization tools such as knowledge-sharing, reusable templates, and forecasting
  • Supporting customers throughout their entire journey with a company using Customer Service tools

Salesforce strikes a balance between providing data-driven products & services and making it possible for customers to create their own value with data. Salesforce is an important example of data-centric value: a tool for an existing business function like sales contact management can also underpin new offerings.

At the same time, Salesforce's business model is designed to keep their products deeply integrated with the critical business functions of their clients. Salesforce's clients may feel 'locked in' to that ecosystem and dependent on—or even at the mercy of—a company with comparatively high prices and a proprietary mindset. It exposes a critical set of choices for leaders with fledgling data initiatives: Should they focus on what they want to do for their customers and accept the high cost of Salesforce in exchange for a rapid path to market? Or use less-integrated or internally-built components in their own firm?


Data-centric products take many forms. Here, 'data-centric products' mean any offering which uses data to create value that could not otherwise be created or where using data substantially improves the offering. For example:

  • Personal finance apps which connect to a user's own bank account to help them compare their spending to their budget
  • Streaming or other content services recommending new content that might be of interest
  • Research content applications helping users map their hypotheses to existing experiments
  • Fitness devices and apps reflecting users' physical activity to them and encouraging them with 'nudges'.


One of the ways data-centric products create value is through data-informed personalization. Traditional business wisdom says that an organization needs to choose between personalization and scale. A bank might have high-quality, personal financial advice for their biggest clients but only offer cursory, generic accounts to their broader customer base. Using a combination of big data (trends from their entire user base) and little data (specific data points about an individual user), wise firms can vastly increase their service's actual and perceived value while continuing to operate at digital scale.

Whether it's movie recommendations from Netflix, Google's customized slide shows based on location and contact data, or a calendar app that knows when you need to leave to get to your next appointment on time, the best digital tools already personalize things for us. If you look more deeply at their strategies, you'll find that most balance big data and little data elements.

For example, Apple Watches ask users for simple goals around physical activity and then 'nudge' users to achieve those goals with tailored advice and encouragement. The watch might say, "you only need to take a brisk twelve-minute walk to reach your activity goal!" or "you're usually more active by this point in the day, but there's still time." This 'nudge' mentality is based in behavioral psychology and works because it isn't the same reminder at the same time and in the same way for every user. Generally, more personalized strategies like 'reminders that you don't set' justify an app's request for personal information and reinforce goodwill between the user and their app or device.

Users even gladly participate with firms who are clearly attempting to offer them financial products like credit cards if the strategy is reciprocal enough. Services like Nerdwallet, Mint, and Credit Karma all ask for sensitive, personal information from users (often through secure APIs like Plaid) in order to first feed helpful information back to users, such as tips for improving their credit score or saving money. In that context, recommendations of credit cards and other financial products are more appropriate, better tailored, and more welcome than traditional broadcast ads for credit cards.



Businesses can also offer other companies personalized services:

  • Market research firms can create custom 'news feeds' for their customers.
  • Professional services firms can create interactive proposals for their customers to adjust without lots of back-and-forth communication.
  • Lawyers can create digital, adaptive documents drawing from their customer's data, allowing non-lawyers to create contracts without always needing to use a professional.

Many of these personalization strategies help improve the value and reduce the friction of existing experiences while reserving face-to-face human time for truly complex and high-value issues.


Faced with lots of options but limited resources, it can be challenging to know what to invest in. Data can always help these decisions, whether you're working at a financial institution and participating in public marketplaces; or inside a small firm and prioritizing your team's time. Inside an enterprise, business modeling tools can help leaders determine which products and services to expand or pivot and which to discontinue; or which markets to expand into or withdraw from.



For example, individuals can use data when investing their savings. Automated investing apps like Wealthfront help people select stocks and other opportunities by matching the stock's performance to their users' tolerance for risk and timeline for expected return. At a basic level, so do 'drip-investing' apps like Acorns.


When used well, data can build or improve relationships.

The most apparent application might be relationships between you and your customers or users, but you can also use data to enable third parties to connect, as online social networks do. The most basic data can provide incremental lifts to your relationships with customers, as with online shops that offer a special discount to customers on their birthday.

The real opportunity is to use more advanced data about your users to help them learn both about themselves and what you can do for them. Instead of focusing on transactions and sales, focus on personalization strategies, match-making with other users, recommending helpful content, or other valuable 'a-ha!' moments that delight them.

Just like a good friend or colleague, aim to be generous and helpful, anticipate their needs, and assist them in thinking through decisions. Use data to listen to your customers rather than target them.

  • United Airlines uses basic data about its customers to express gratitude: phone agents thank callers for how long they've been flying with the company.
  • Facebook applies its advanced data capabilities not just to advertising but also to the real needs of its users, like suggesting connections with old friends and fostering good memories by suggesting users revisit old content in which they and their friends are 'tagged' and expressed a positive sentiment.
  • Apple strengthens connections between its users by prompting friends and family to congratulate each other on workouts or challenge each other to friendly competitions.

Bringing it All Together

The are many strategies to create value with data. Whether evaluating your own opportunities or products on the market, it can be helpful to think through the elements of a digital value proposition. There are a few examples of digital value propositions below, based on the following ad lib. 

1. Our [initiative or offering]

2. help(s) [customer group

3. who want to [jobs to be done

4. by using data from [sources

5. to [reduce verb + customer pain

6. and [increase verb + customer gain

7. unlike [competing value propositions].


Sometimes the best way to think about creating value with data is by considering how you will describe an offering or initiative. We've put together this digital value proposition ad-lib focused on data to help you tell the story of how you'll create value.

Data Value Proposition Ad Lib

1. Our [offering]

  • This is where you can put a quick description of your offering. If it has a catchy, branded name, it's okay to put it here along with a generic name. You're about to explain what it really means in a moment. For example, "Our content recommendation tool, ResearchGuiderApp™."

2. help(s) [customer or user group]

  • Indicate who you are helping. Try to be specific; you may wish to go through a user persona exercise to clarify the various groups you are serving. For example, it "helps our advanced research customers."

3. who want to [jobs to be done]

  • Pull from one of the eight ways to create value with data described earlier or one of your own. For example, indicate what matters to the people you serve "…who need to find scientifically sound content."

4. by using data from [sources]

  • Indicate which data sources you are drawing upon so that others have a sense of how your offering works. For example, "by using data from other researchers' searches to make recommendations."

5. to [reduce verb + customer pain]

  • Explain how you make your user's job less difficult. For example, "to save time when searching"

6. and [increase verb + customer gain]

  • Explain how you make your user's job more effective or satisfying. For example, "and help them discover new ways of thinking about what they're researching."

7. unlike [competing value propositions].

  • Differentiate your value proposition by comparing it to other ways people might attempt to get the same job done. There are three kinds of competitors: direct competitors (which solve the same problem the same way), indirect competitors (which solve the same problem in a different and perhaps less effective way), and phantom competitors (barriers that prevent the problem from being solved). For example:
  • Direct: "…unlike AcmeCo's service, which doesn't have a long history of searches to draw from."
  • Indirect: "…unlike hiring a third-party research firm, which can be expensive and cause delays."
  • Phantom: "…instead of skipping in-depth research because it is too expensive for a small firm."

When all of these elements have been referenced, you have constructed a basic value proposition which you can present to others to gauge interest and discuss feasibility.


Retail Traffic

Stores traditionally report their sales results in the quarter after the holidays, leaving investors to wonder about performance for an uncomfortable length of time. However, many retail stores have parking lots. Parking lot occupancy seen from satellites or security cameras can be cross-referenced to nearby stores to guess future earnings results.

If we ask the basic question of "What will a retail store's holiday sales numbers be?" we might have to wait until a past-looking report is issued. What if we went further?

  • Our Retail Oracle offering
  • help(s) investors in consumer retail stores
  • who want to predict future earnings
  • by using data from satellite images of parking lots, Twitter post sentiment, job posts, Slickdeals and Instagram posts
  • to lessen the cost and effort of trendcasting
  • and make forecasting revenue and profitability fundamentals easier,
  • unlike waiting until earnings reports have been released.


Responsible Governance Predictor

In the finance world, focus on responsible investing has resulted in "Environmental, Social and Governance" stocks, or "ESG" stocks. However, it's not always easy to know which companies are doing the right thing, especially in the future. So, someone could create a score to help predict how well-governed a company might be using company statistics like founders, investors, and press releases to guess at how will a firm might behave and perform in the future:

Our governance predictor
help(s) ESG investors
who want to predict companies’ future ESG scores
by using data from Crunchbase, Angellist, Glassdoor, LinkedIn, and PRweb
to reduce the effort and time needed to assess companies’ governance
and better predict future ESG scores
unlike waiting until third-party assessments have occurred.

Discussion Prompts

Which data monetization routes are most exponential?

Which data monetization routes are easiest for you right now?

How is data monetized in a practical way?


Next Steps:

No items found.

Start With Your Team

There are several roles to consider when creating value with data. This is not a strict set of roles but a prompt to think through the 'thinking capabilities' of your team. Make sure these perspectives (and functions) are present at some point in your development process:

  • Data & Business Strategist (business modeling, hypothesis creation)
  • Data Analyst (infer directly from data, validate, normalize)
  • Data Scientist (create/expand algorithms, machine learning models, test hypotheses)
  • Data Scout (find sources of potential data, evaluate)
  • Data Engineer (build systems, tools)
  • Head of Data (contextualize and network datasets, monetize)
  • …whoever applies the data, in whatever way


Making Data Work For Our Customers

List a few types of data you know your organization has access to. Then think of at least one use case where that data could create value for your customers.

Hint: How could this data be used to make your product or service easier to access or more useful for your customers?