Application Program Interface: a way to standardize data and commands to facilitate communication between different systems that otherwise would not be able to interact meaningfully.
A step-by-step method for solving a problem, expressed as a series of decisions, like a flow chart or decision tree.
Sometimes an adjacent dataset can be used to infer something about a ‘traditional’ dataset. This is a specific category of use that is ripe with opportunities but also full of ethical considerations.
The aggregation of data points into large datasets, followed by analyzing those datasets to find patterns. It's called 'big' because this strategy involves merging many different kinds and sources of data to run machine learning processes.Big data's default strategy is to bring together as many data points as possible to help the company do things better, faster, and/or cheaper.
Files and folders stored online, rather than on a local hard drive. Google Drive, Dropbox, and iCloud are all examples of Cloud Storage providers.
A mindset that allows machines and humans to work together to solve real-world problems. Computational thinking is comprised of several stages: - Decomposition: breaking down a large, complex problem into several smaller, simpler ones- Abstraction: creating a model of a system that leaves out unnecessary parts, while allowing us to see how the different pieces fit together- Finding Patterns that are reusable in various contexts, like building blocks, and using these to create- Algorithms - series of specific instructions telling a computer how to process data, make decisions, or solve problems. ~~ Programs (or applications) are created by combining algorithms so they can work together to process data in useful ways.
Examining and transforming data to extract information and discover new insights.Essentially, where data becomes information.
The standards that govern what data is collected, how it is stored and then implemented into the organization's information system.
A file in a filing cabinet, or a row in a spreadsheet: Static, only available in one place, staying the same until a user modifies it.
Exhaustive (or at least comprehensive) lists of what datasets are available from an organization or other source. For example, scientific researchers might need a list of all the medical statistic datasets they could access; a company might need various lists of customer information, or an app developer might need data about users.
Something recorded or logged without a specific intent in mind.
The practice of checking, correcting, labeling, and normalizing data. Common activities related to data hygiene include:- Checking for accuracy- Ensuring formats are the same in each dataset (such as the format for date & time)- Determining or creating a unique identifier (such as an email address or phone number, which allows combining one dataset with others)The more complex the data, the more specific the taxonomy needs to be, or you might end up with a mountain of un-sortable, un-verifiable data.
Dynamic, 'live', changing in real timeWater is a good metaphor for data in motion: a ‘stream’ like a live video feed or a ‘flow’ of stock market data.
Aggregations of data to find it more easily and analyze it as a whole.For example, Facebook's 'Social Graph' collects and cross-references all data generated by every user, for future analysis.
Spaces designed for parties to buy, sell, and lease data to each other, including both broad and highly-focused datasets. For example, on Amazon Web Services' marketplace, users can source and sell data on COVID-19, real estate, satellite imagery, healthcare claims, traffic, and many other topics.
Flows of data from a lot of different places; you can also think of them as streams or pipelines.
Where, geographically, is your data (or your users' data) stored? What are the legal jurisdictions of the systems it passes through? Different rights might apply to data depending on where it was gathered, manipulated, and/or consumed.
Classification systems for your data. They allow you to provide specific categories for each record within your dataset. A well-designed taxonomy helps you and your organization rigorously track what data you have, or could have, and also helps organize your metadata. Examples of taxonomies include the Dewey Decimal System used to organize topics in libraries and research; the North American Industry Classification System (NAICS); or the World Health Organization's International Classification of Diseases (ICD).
End-User Licence Agreement: The terms and conditions a user must agree to in order to use a specific app or piece of software.
A way that humans can help steer the direction of machine algorithms.For example, you might see this in the form of a recommendation from Amazon coupled with the question, “Did we recommend the right product?”
Something directly and intentionally measured, like a list of stock investments or an image file on a camera.
Ensuring that users who give us permission to use their data know where and how that data will be used, and what this might mean for them in the future.
Understandings derived from analyzing data; crucially separate from the data itself.
A set of common protocols used to enable many different devices to communicate both to each other, and to their human owners
Specific data points about an individual. Little data sometimes includes specialized tools to build profiles about people, called a social graph, that represents and organizes the many facets of their identity, behavior, and social networks.
Some advanced algorithms can 'learn' and update their decision trees using models that adapt as they are exposed to more and more data over time.
Data about another piece of data, used to understand, sort, and validate datasets to increase their usefulness. Common examples of metadata include the send and receive dates of emails, the unique address of a server, or info about which app was used to post a particular message to Twitter. Other examples of metadata are when a computer file is created or modified, the number of times a post has been viewed on social media, or a song has been played on Spotify.
A set of algorithms designed to help machines understand natural human speech.Siri, Google Assistant, and Alexa all rely heavily on natural language processing.
Using data, statistical algorithms, and machine learning to calculate the likelihood of future outcomes. This is different from traditional analytics which focus only on what happened in the past.
Data that has not yet been manipulated, processed, or sorted. Raw data is rarely of use to humans.
Software Development Kit - Inter-operable building blocks for developers that can be combined in many different ways. Used to speed up and streamline app development by ensuring developers aren't re-creating existing technologies.