Database Dictionary

A

ACID (Atomicity, Consistency, Isolation, Durability)

ACID refers to the four essential properties of database transactions: Atomicity, Consistency, Isolation, and Durability, ensuring reliable and predictable data operations.

Aggregation

Aggregation in databases is the process of summarizing data, such as calculating totals or averages, using functions like SUM, AVG, or COUNT.

Alias

An alias in databases is a temporary name given to a table or column to make queries easier to read or write.

An anchored window function is a type of window function in SQL that calculates values over a specified range of rows, often defined relative to the current row, such as using RANGE or ROWS to create specific boundaries.

Anti Join

An anti join is a type of join that returns rows from one table that do not have matching rows in another table, effectively filtering out the matches.

B

Backup

A backup is a copy of data from a database or system, created to protect against data loss or corruption.

Batch Processing

Batch processing is the execution of a series of jobs or tasks in a group, without user interaction, typically used for processing large volumes of data at scheduled times.

B+ Tree Index

A B+ Tree index is a balanced tree data structure used in databases for fast searching, sorting, and range queries, where all values are stored in the leaf nodes and internal nodes act as guides to locate them.

Binary Large Object (BLOB)

A Binary Large Object (BLOB) is a data type used to store large amounts of binary data, such as images, audio, or video files, in a database.

Bitmap Index

A Bitmap index is a database index that uses bitmaps (bit arrays) to represent the presence or absence of values, providing fast query performance, especially for columns with a low cardinality.

Buffer Pool

A buffer pool is a memory area in a database system that stores frequently accessed data pages to reduce disk I/O and improve query performance.

C

Candidate Key

A candidate key is a minimal set of columns in a table that uniquely identifies each row, with no unnecessary attributes. There can be multiple candidate keys in a table.

Cardinality

Cardinality refers to the number of unique values in a column or the number of rows in a table, often used to describe relationships between tables in a database.

Cascade

A referential action in databases that automatically updates or deletes related rows in child tables when changes are made to parent rows, typically used with foreign keys.

Checkpoint

A process in database management that saves the current state of the database to disk, ensuring durability and helping with recovery in case of failure.

Clustered Index

An index that determines the physical order of data rows in a table. There can only be one clustered index per table, as the data is stored in the order of the index.

Column

A vertical set of values in a table, representing a specific attribute or field in a database.

Commit

A database operation that saves all changes made during a transaction, making them permanent in the database.

Concurrency Control

A mechanism used to manage simultaneous database transactions to ensure data integrity and avoid conflicts, like using locks or isolation levels.

Constraint

A rule enforced on data in a database, such as uniqueness, referential integrity, or value restrictions, to maintain data accuracy and consistency.

Cursor

A database object used to retrieve, manipulate, and navigate through a result set row by row, often used in procedural SQL operations.

CUBE and ROLLUP Operators

Both are SQL operators used for multidimensional aggregation:

CUBE generates aggregates for all combinations of dimensions.
ROLLUP generates aggregates in a hierarchical manner, summarizing data from the most detailed to the most general level.

D

Data Definition Language (DDL)

A subset of SQL used to define and manage database structures, including commands like CREATE, ALTER, and DROP to define tables, indexes, and constraints.

Data Manipulation Language (DML)

A subset of SQL used to manage and manipulate data in a database, including commands like INSERT, UPDATE, DELETE, and SELECT.

Data Query Language (DQL)

A subset of SQL used specifically for querying and retrieving data from a database, typically involving the SELECT statement.

Data Control Language (DCL)

A subset of SQL used to control access to data, including commands like GRANT and REVOKE to manage permissions and security.

Database Management System (DBMS)

Software that provides an interface for interacting with a database, including tools for storing, retrieving, and managing data, as well as ensuring data integrity, security, and performance.

Deadlock

A situation in database management where two or more transactions are waiting for each other to release resources, resulting in a standstill where no transaction can proceed.

Denormalization

The process of deliberately introducing redundancy into a database by merging tables or copying data to optimize query performance, often at the cost of storage efficiency.

Derived Table

A temporary table created within a query, usually in a FROM clause, to hold intermediate results, which can then be referenced as a regular table in subsequent operations.

Dimension Table

A table in a data warehouse used to store descriptive attributes (or dimensions) related to the fact data, such as time, product, or customer.

Dirty Read

A situation in database transactions where one transaction reads data that has been modified by another transaction but not yet committed, leading to potential inconsistencies.

Distribution

Refers to the method of storing data across multiple locations or systems, especially in distributed databases, to improve performance, reliability, and scalability.

Document Store

A type of NoSQL database that stores, retrieves, and manages semi-structured data as documents, often in formats like JSON, BSON, or XML, and is commonly used in applications with flexible schema requirements.

Duplicate Elimination

The process of removing duplicate rows from query results, often done using the DISTINCT keyword in SQL to ensure that only unique values are returned.

E

Entity

An object or concept in a database that is distinguishable from other objects, typically represented as a table, and often corresponds to a real-world item like a customer, product, or order.

Entity-Relationship Diagram (ERD)

A visual representation of the entities in a database and the relationships between them, used to model the structure of a database. It includes entities, attributes, and their relationships.

Eventual Consistency

A consistency model used in distributed databases where updates to data will eventually propagate to all nodes, but the system may temporarily allow inconsistencies between replicas before all updates converge.

EXPLAIN Plan

A command in SQL used to display the execution plan of a query, showing how the database engine plans to execute the query, including the order of operations and indexes used, which helps in optimizing performance.

F

Fact Table

A table in a data warehouse that stores quantitative data for analysis, typically consisting of numeric measures (like sales or revenue) and foreign keys linking to dimension tables.

Foreign Key

A column or a set of columns in one table that links to the primary key of another table, establishing a relationship between the two tables to ensure referential integrity.

Full Outer Join

A type of join in SQL that returns all rows from both tables, with matching rows from both sides where available. If there is no match, the result will contain NULLs for non-matching rows from either table.

Function-Based Index

An index in a database that is created based on the result of a function or expression, rather than just the column values, to optimize queries that involve the function or calculation.

Functional Dependency

A relationship between two sets of attributes in a database where one attribute (or set of attributes) uniquely determines another. For example, in a table, the value of EmployeeID might determine EmployeeName.

G

Granularity

Refers to the level of detail or precision of data in a database. Finer granularity means more detailed data, while coarser granularity means summarized or aggregated data.

Group By

A SQL clause used to group rows that have the same values in specified columns into summary rows, often used with aggregate functions like COUNT, SUM, AVG, etc., to perform operations on each group.

Gather Statistics

The process of collecting data about database objects, such as tables and indexes, to help the database engine optimize query performance by using up-to-date information about data distribution and index usage.

H

Hash Join

A join algorithm used in databases that hashes the join key values of one table and uses the hash table to quickly find matching rows in the other table, optimizing join performance for large datasets.

Horizontal Partitioning

The process of dividing a database table into smaller, more manageable pieces (partitions) by rows, typically based on a specific range or key, to improve performance and scalability.

Histogram

A graphical representation or statistical tool used to show the distribution of data values in a column, often used in database optimization to help the query planner understand data distribution for better indexing and query execution plans.

I

Index

A database object that improves the speed of data retrieval operations on a table, by creating a sorted structure based on one or more columns.

Inner Join

A type of SQL join that returns only the rows with matching values in both tables being joined, excluding rows that do not have a match in either table.

Insert Statement

A SQL command used to add new rows of data into a table. It specifies the table and the values to be inserted into each column.

Isolation Level

A setting that controls the visibility of changes made by one transaction to other concurrent transactions in a database, affecting how issues like dirty reads, non-repeatable reads, and phantom reads are handled.

Incremental Backup

A type of database backup that only saves the changes made since the last backup (either full or incremental), reducing storage and backup time compared to full backups.

In-Memory Database

A database that stores data primarily in the system’s main memory (RAM) instead of on disk, allowing for faster data retrieval and processing but typically with lower durability and higher volatility.

J

Join

A SQL operation used to combine rows from two or more tables based on a related column, allowing data to be queried across multiple tables simultaneously.

JSON Data Type

A data type in databases that stores data in JavaScript Object Notation (JSON) format, enabling the storage and manipulation of semi-structured data with nested arrays and objects.

K

Key

In a database, a key is an attribute or a set of attributes that uniquely identifies a record in a table. Examples include primary keys, foreign keys, and candidate keys.

Kafka Connector

A component in the Apache Kafka ecosystem that facilitates the integration of Kafka with other systems, allowing data to be read from or written to external sources like databases, data lakes, or other messaging systems.

L

Left Outer Join

A type of SQL join that returns all rows from the left table, and the matching rows from the right table. If there is no match, the result will include NULLs for columns from the right table.

Load Balancing

A technique used to distribute incoming network or database traffic across multiple servers or resources to optimize resource use, reduce server overload, and ensure high availability.

Lock

A mechanism used in databases to control access to data, ensuring that multiple transactions do not interfere with each other, typically used to prevent race conditions and ensure data consistency.

Logical Volume Manager (LVM)

A disk management system in Linux that provides flexible volume management, allowing users to create, resize, and manage disk partitions (logical volumes) more easily compared to traditional partitioning methods.

M

Materialized View

A database object that stores the results of a query physically, which can be refreshed periodically or on-demand. Unlike regular views, which are computed at runtime, materialized views provide faster query performance at the cost of storage and refresh time.

Metadata

Data that describes other data, such as the structure, relationships, and properties of database objects (tables, columns, indexes, etc.), providing context and helping with data management.

Multi-Version Concurrency Control (MVCC)

A database technique that allows multiple transactions to access the same data concurrently without conflicting by keeping multiple versions of a record, ensuring consistency without locking.

Memory-Optimized Table

A type of table in a database designed for faster access by storing data in memory rather than on disk, often used in performance-critical applications to reduce I/O latency.

Mergesort

A comparison-based sorting algorithm that divides the data into smaller chunks, sorts them, and then merges them back together in sorted order. It’s efficient with a time complexity of O(n log n), often used for sorting large datasets.

N

Normalization

The process of organizing a database to reduce redundancy and dependency by dividing large tables into smaller ones and using relationships (such as foreign keys) to link them. This helps to eliminate anomalies during data operations.

NoSQL

A category of databases that are designed for large-scale, distributed data storage and are optimized for unstructured or semi-structured data. NoSQL databases include document stores, key-value stores, column-family stores, and graph databases.

Nested Loop Join

A type of join algorithm where, for each row in one table (the outer table), the database searches for matching rows in another table (the inner table). It's simple but can be inefficient for large datasets.

Nesting

The process of embedding one query or operation inside another, such as using subqueries in SQL, or having loops within loops. In SQL, nesting is used to execute multiple operations in sequence or in a hierarchical manner.

O

Object-Oriented Database

A type of database that stores data in the form of objects, as used in object-oriented programming. These databases support complex data types and relationships, allowing for more flexible data modeling and manipulation.

OLAP (Online Analytical Processing)

A category of data processing that enables fast querying and analysis of multidimensional data, often used in data warehousing and business intelligence applications for complex calculations and trend analysis.

OLTP (Online Transaction Processing)

A class of database applications focused on managing transactional data, such as sales, orders, or payments. OLTP systems are optimized for high transaction volume, ensuring fast query processing and data consistency.

Optimization

The process of improving the performance of a database, query, or system by reducing resource usage, such as CPU time, memory, or I/O operations. Database optimization typically involves indexing, query rewriting, and schema adjustments.

Outer Join

A type of SQL join that returns all rows from one table and the matching rows from the other table. If no match is found, the result will contain NULLs for the non-matching rows. Types of outer joins include Left Outer Join, Right Outer Join, and Full Outer Join.

Oracle PL/SQL

A procedural language extension for SQL developed by Oracle, used to write complex database applications. It allows for the use of variables, loops, conditionals, and exception handling within SQL queries, enhancing SQL's capability for stored procedures and functions.

P

Partition

The process of dividing a large database table into smaller, more manageable pieces (partitions) based on specific criteria, such as ranges or key values, to improve performance and ease of management.

Partition Pruning

A technique used by the database query optimizer to skip irrelevant partitions during query execution, improving query performance by reducing the number of partitions scanned.

Primary Key

A unique identifier for each record in a database table, ensuring that no two rows have the same value in the primary key column(s). It also serves as a reference for foreign keys in other tables.

Procedure

A stored set of SQL statements or commands that are grouped together to perform a specific task, such as inserting or updating data. Procedures can include parameters and be called by applications or users to execute predefined operations.

Query Optimization

The process of improving the performance of a SQL query by rewriting it, using appropriate indexes, or applying various optimization techniques to reduce query execution time and resource consumption.

Pipelined Function

A type of function in databases (often used in Oracle) that allows rows of data to be processed and returned to the query as they are produced, rather than waiting for all the rows to be computed before returning results, improving performance in certain scenarios.

Q

Query Execution Plan

A detailed roadmap generated by the database management system to describe how a SQL query will be executed. It outlines the sequence of operations, such as table scans, joins, and indexing, along with their costs, which helps in understanding query performance and optimization.

Query Optimizer

A component of the database management system that analyzes SQL queries and determines the most efficient execution plan. It chooses the best approach based on factors like indexes, available statistics, and query structure to minimize resource usage and improve query performance.

R

Range Scan

A type of index scan in which the database searches for a specific range of values within an indexed column, typically used for queries involving range conditions like BETWEEN or >, <, <=, etc.

Recovery

The process of restoring a database to a consistent state after a failure or crash, ensuring that committed transactions are preserved and uncommitted changes are rolled back.

Referential Integrity

A set of rules ensuring that relationships between tables remain consistent. It typically involves foreign key constraints, ensuring that a foreign key in one table corresponds to a primary key in another table, preventing orphaned records.

Replication

The process of copying and maintaining database objects (such as tables or entire databases) across multiple locations or systems to ensure data redundancy, availability, and fault tolerance.

Right Outer Join

A type of SQL join that returns all rows from the right table, and the matching rows from the left table. If no match is found, the result will include NULLs for columns from the left table.

Rollback

The operation that undoes the changes made by a transaction, reverting the database to its state before the transaction began. This is used to ensure consistency in the case of errors or conflicts.

Row

A single record in a database table, consisting of a set of values for the columns defined in the table's schema.

Row-Level Locking

A concurrency control method where individual rows in a table are locked during a transaction, allowing multiple transactions to work on different rows simultaneously without conflicting.

Recursive Common Table Expression (CTE)

A type of CTE in SQL that references itself to perform recursive queries, such as traversing hierarchical data structures like organizational charts or bill-of-materials.

Redo Log

A log that records all changes made to the database, used during recovery to redo or reapply changes after a crash, ensuring that committed transactions are preserved.

S

Schema

The structure that defines the organization of data in a database

S

Serialization

The process of converting data or objects into a format that can be easily stored, transmitted, or reconstructed, often used for storing or sending data between systems or components.

Snapshot

A read-only copy of the database at a specific point in time, typically used for backup, recovery, or auditing purposes. It captures the state of the database without locking the original data.

Spatial Database

A database that is optimized for storing, querying, and analyzing spatial data, such as geographical or geometric data (e.g., maps, locations, and distances). These databases support spatial indexes and functions.

Star Schema

A type of database schema used in data warehousing, where a central fact table is connected to multiple dimension tables, forming a star-like structure. It is optimized for query performance and simplifies analytical queries.

Stored Procedure

A precompiled collection of SQL statements and logic that can be executed on the database server. Stored procedures are often used to encapsulate business logic and improve performance by reducing the need for repetitive SQL code.

Subquery

A query nested inside another query, often used to retrieve a value or set of values that are used by the outer query. Subqueries can appear in the WHERE, FROM, or SELECT clauses.

Surrogate Key

A unique identifier used in a database, typically an integer, that has no business meaning and is used solely for identification purposes. Surrogate keys are often used in dimensional modeling (e.g., in star schemas).

Symmetric Multiprocessing (SMP)

A computer architecture where multiple processors share a common memory and work together on a task. SMP systems improve performance by allowing parallel processing and better resource utilization.

Scalar Function

A function that operates on a single value and returns a single result. Scalar functions are typically used in queries to perform calculations or transformations on individual data values (e.g., UPPER(), ROUND()).

Set Operations (UNION, INTERSECT, MINUS)

SQL operations that combine the results of two or more queries:

UNION: Combines results from multiple queries, removing duplicates.
INTERSECT: Returns only the rows common to all queries.
MINUS: Returns rows from the first query that are not present in the second.

Sequence

A database object used to generate unique numerical values, often used for creating primary key values in tables. Sequences are typically incremented automatically with each use.

Soft Delete

A method of deleting records where the data is not actually removed from the database, but marked as deleted (e.g., by setting a flag or status column), allowing for the possibility of recovery or auditing.

Sparse Index

An index in which only a subset of the values from the indexed column are stored, typically used for columns with many NULL values or for indexing non-contiguous data to save space and improve performance.

T

Table

A collection of rows and columns in a database that stores data. Each column represents an attribute, while each row represents a record or data entry.

Transaction

A logical unit of work in a database system that contains one or more operations (such as insert, update, or delete). Transactions are designed to be atomic, consistent, isolated, and durable (ACID properties) to ensure data integrity.

Trigger

A special type of stored procedure that is automatically executed in response to certain events on a table or view, such as inserts, updates, or deletes. Triggers are often used for enforcing business rules or auditing data changes.

Tuple

A single row of data in a relational database table. In mathematical terms, a tuple is an ordered set of values, where each value corresponds to an attribute in the database.

Temporal Table

A table that stores historical data along with the current state of data. Temporal tables support time-based queries, enabling users to track changes over time and retrieve data as it appeared at a specific point in time.

Text Search

A database feature that enables searching for specific words or phrases in textual data. It often involves creating full-text indexes to improve search performance and allows for sophisticated querying, such as wildcards, phrase matching, and relevance scoring.

Tree Structure

A hierarchical data structure consisting of nodes connected by edges, often used to represent relationships like parent-child or root-leaf. In databases, tree structures are commonly used to represent hierarchical data, such as organizational charts, file systems, or category trees.

U

Unique Constraint

A database constraint that ensures all values in a column (or a combination of columns) are unique, meaning no two rows can have the same value in those columns. It prevents duplicates and helps maintain data integrity.

Union

A set operation in SQL that combines the results of two or more SELECT queries into a single result set, removing duplicate rows. If you want to include duplicates, you can use UNION ALL.

Update Statement

A SQL command used to modify existing records in a database table. It allows for updating one or more columns in a table for rows that meet specific conditions.

Upsert

A combination of "update" and "insert" operations. It either updates an existing record if it exists or inserts a new record if no matching record is found. This is typically done using INSERT ON DUPLICATE KEY UPDATE (in MySQL) or MERGE (in SQL Server and Oracle).

User Defined Function (UDF)

A custom function created by the user to perform specific operations within SQL queries. UDFs allow for more complex logic and calculations than what is available through built-in SQL functions.

Unpivot Operation

A data transformation operation that converts columns into rows, typically used to normalize data or reshape tables for easier analysis. It is the inverse of a pivot operation, which converts rows into columns.

V

View

A virtual table in a database that is defined by a SQL query. It does not store data itself but displays data from one or more tables based on the query. Views can be used to simplify complex queries, present data in a particular format, or provide security by limiting access to certain columns or rows.

Vertical Partitioning

A technique used in database design where a table is split into multiple parts based on columns rather than rows. This is often done to optimize performance, especially for queries that only need a subset of columns, as it reduces the amount of data read.

Virtual Table

A table that is not physically stored in the database but is dynamically generated by a query or system process. Views are examples of virtual tables. They represent the result of a query and act like a table for purposes of querying, but they do not hold data themselves.

Versioning

The process of keeping multiple versions of a data record, often used in systems that track historical changes to records. Versioning allows for the retrieval of past states of a record, useful for auditing, tracking changes, and maintaining data consistency in the face of updates.

W

Warehouse

In the context of databases, a data warehouse is a large, centralized repository of integrated data from multiple sources, designed for reporting and data analysis. It is optimized for read-heavy operations and often uses techniques like OLAP (Online Analytical Processing) to support complex queries and business intelligence.

Write-Ahead Logging (WAL)

A technique used in database management systems to ensure data integrity. In WAL, before any changes are made to the database, the changes are first recorded in a log file. This ensures that in case of a system crash, the database can be restored to a consistent state by replaying the log and applying the changes.

Window Function

A type of SQL function that performs a calculation across a set of rows related to the current row, without collapsing the result into a single output row. Window functions are often used for running totals, ranking, or calculating moving averages, and they are defined using the OVER() clause.

X

XACT_ABORT

A setting in SQL Server that controls the behavior of transactions in the event of a runtime error. When XACT_ABORT is set to ON, any runtime error will automatically cause the transaction to be rolled back. If it is set to OFF, the transaction will not be rolled back unless explicitly instructed to do so by a ROLLBACK statement.

XML Data Type

A data type in databases that is used to store XML (Extensible Markup Language) documents. It allows for the storage, querying, and manipulation of structured XML data directly within the database. Many database systems provide functions and methods for querying XML data, validating it, and extracting information using XPath or XQuery.

Y

Yield

In database systems, "yield" refers to the amount of work or results produced by a query or operation. It can also describe the process of returning data or results during the execution of a function or query.

Z

Z-order Curve

A multidimensional indexing technique that maps multi-dimensional data to one dimension while preserving locality. It is commonly used in spatial databases for efficient querying and retrieval.

Zero-Padding

The practice of adding zeros to the left of a number or string to ensure it reaches a specified length. It's often used in data formatting, like in serial numbers or dates, to maintain consistent size.