This site is in beta. Tell us what you think.

Glossary

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Internet of Things
Data Ethics and Data Politics

A set of common protocols used to enable many different devices to communicate both to each other, and to their human owners

EULA
Data Ethics and Data Politics

End-User Licence Agreement: The terms and conditions a user must agree to in order to use a specific app or piece of software.

Insights
Creating Value with Data

Understandings derived from analyzing data; crucially separate from the data itself.

Data Exhaust
Creating Value with Data

Something recorded or logged without a specific intent in mind.

Fundamental Data
Creating Value with Data

Something directly and intentionally measured, like a list of stock investments or an image file on a camera.

Alternative Data
The Data Supply Chain

Sometimes an adjacent dataset can be used to infer something about a ‘traditional’ dataset. This is a specific category of use that is ripe with opportunities but also full of ethical considerations.

SDK
The Data Supply Chain

Software Development Kit - Inter-operable building blocks for developers that can be combined in many different ways. Used to speed up and streamline app development by ensuring developers aren't re-creating existing technologies.

API
The Data Supply Chain

Application Program Interface: a way to standardize data and commands to facilitate communication between different systems that otherwise would not be able to interact meaningfully.

Feedback Loops
The Data Supply Chain

A way that humans can help steer the direction of machine algorithms.For example, you might see this in the form of a recommendation from Amazon coupled with the question, “Did we recommend the right product?”

Natural Language Processing
The Data Supply Chain

A set of algorithms designed to help machines understand natural human speech.Siri, Google Assistant, and Alexa all rely heavily on natural language processing.

Predictive Analytics
The Data Supply Chain

Using data, statistical algorithms, and machine learning to calculate the likelihood of future outcomes. This is different from traditional analytics which focus only on what happened in the past.

Machine Learning
The Data Supply Chain

Some advanced algorithms can 'learn' and update their decision trees using models that adapt as they are exposed to more and more data over time.

Algorithms
The Data Supply Chain

A step-by-step method for solving a problem, expressed as a series of decisions, like a flow chart or decision tree.

Data Analysis
The Data Supply Chain

Examining and transforming data to extract information and discover new insights.Essentially, where data becomes information.

Data Taxonomy & Folksonomy
The Data Supply Chain

Classification systems for your data. They allow you to provide specific categories for each record within your dataset. A well-designed taxonomy helps you and your organization rigorously track what data you have, or could have, and also helps organize your metadata. Examples of taxonomies include the Dewey Decimal System used to organize topics in libraries and research; the North American Industry Classification System (NAICS); or the World Health Organization's International Classification of Diseases (ICD).

Data Hygiene
The Data Supply Chain

The practice of checking, correcting, labeling, and normalizing data. Common activities related to data hygiene include:- Checking for accuracy- Ensuring formats are the same in each dataset (such as the format for date & time)- Determining or creating a unique identifier (such as an email address or phone number, which allows combining one dataset with others)The more complex the data, the more specific the taxonomy needs to be, or you might end up with a mountain of un-sortable, un-verifiable data.

Data Lakes
The Data Supply Chain

Aggregations of data to find it more easily and analyze it as a whole.For example, Facebook's 'Social Graph' collects and cross-references all data generated by every user, for future analysis.

Data Rivers
The Data Supply Chain

Flows of data from a lot of different places; you can also think of them as streams or pipelines.

Little Data
The Data Supply Chain

Specific data points about an individual. Little data sometimes includes specialized tools to build profiles about people, called a social graph, that represents and organizes the many facets of their identity, behavior, and social networks.

Big Data
The Data Supply Chain

The aggregation of data points into large datasets, followed by analyzing those datasets to find patterns. It's called 'big' because this strategy involves merging many different kinds and sources of data to run machine learning processes.Big data's default strategy is to bring together as many data points as possible to help the company do things better, faster, and/or cheaper.

Data Sovereignty
The Data Supply Chain

Where, geographically, is your data (or your users' data) stored? What are the legal jurisdictions of the systems it passes through? Different rights might apply to data depending on where it was gathered, manipulated, and/or consumed.

Data Architecture
The Data Supply Chain

The standards that govern what data is collected, how it is stored and then implemented into the organization's information system.

Data In Motion
The Data Supply Chain

Dynamic, 'live', changing in real timeWater is a good metaphor for data in motion: a ‘stream’ like a live video feed or a ‘flow’ of stock market data.

Data At Rest
The Data Supply Chain

A file in a filing cabinet, or a row in a spreadsheet: Static, only available in one place, staying the same until a user modifies it.

Cloud Storage
The Data Supply Chain

Files and folders stored online, rather than on a local hard drive. Google Drive, Dropbox, and iCloud are all examples of Cloud Storage providers.

Metadata
The Data Supply Chain

Data about another piece of data, used to understand, sort, and validate datasets to increase their usefulness. Common examples of metadata include the send and receive dates of emails, the unique address of a server, or info about which app was used to post a particular message to Twitter. Other examples of metadata are when a computer file is created or modified, the number of times a post has been viewed on social media, or a song has been played on Spotify.

Data Marketplaces
The Data Supply Chain

Spaces designed for parties to buy, sell, and lease data to each other, including both broad and highly-focused datasets. For example, on Amazon Web Services' marketplace, users can source and sell data on COVID-19, real estate, satellite imagery, healthcare claims, traffic, and many other topics.

Data Catalogues
The Data Supply Chain

Exhaustive (or at least comprehensive) lists of what datasets are available from an organization or other source. For example, scientific researchers might need a list of all the medical statistic datasets they could access; a company might need various lists of customer information, or an app developer might need data about users.

Raw Data
The Data Supply Chain

Data that has not yet been manipulated, processed, or sorted. Raw data is rarely of use to humans.

Informed Consent
The Data Supply Chain

Ensuring that users who give us permission to use their data know where and how that data will be used, and what this might mean for them in the future.

Computational Thinking
The Data Supply Chain

A mindset that allows machines and humans to work together to solve real-world problems. Computational thinking is comprised of several stages: - Decomposition: breaking down a large, complex problem into several smaller, simpler ones- Abstraction: creating a model of a system that leaves out unnecessary parts, while allowing us to see how the different pieces fit together- Finding Patterns that are reusable in various contexts, like building blocks, and using these to create- Algorithms - series of specific instructions telling a computer how to process data, make decisions, or solve problems. - Programs (or applications) are created by combining algorithms so they can work together to process data in useful ways.

Select any number of buttons on the left to see varieties of data sources available for analysis.