What is Data Manipulation Language (DML)?
Data Manipulation Language (DML) is a subset of SQL (Structured Query Language) used to manage data within database objects. While DDL deals with the structure of the database, DML is concerned with the actual data stored in the database. DML statements allow users to retrieve, insert, modify, and delete data in tables and other database objects.
DML operations are the most frequently used database operations in typical applications, forming the foundation of data interactions in database-driven systems.
Purpose of DML
DML commands serve several essential purposes in database management:
- Data Retrieval: Fetch specific data from database tables
- Data Modification: Insert new records, update existing records, or delete records
- Transaction Processing: Enable ACID transactions through commit and rollback operations
- Data Analysis: Support complex queries for data analysis and reporting
- Application Integration: Provide standard interfaces for applications to interact with databases
Key DML Commands
1. SELECT
The SELECT statement retrieves data from one or more tables.
Basic SELECT Statement
Retrieves all columns from a table.
SELECT * FROM students;
Column Selection
Retrieves specific columns from a table.
SELECT student_id, first_name, last_name FROM students;
Filtering with WHERE
Retrieves rows that meet specific conditions.
SELECT * FROM students WHERE major = 'Computer Science';
-- Multiple conditions
SELECT * FROM students
WHERE major = 'Computer Science' AND enrollment_year = 2023;
-- Range conditions
SELECT * FROM students
WHERE enrollment_date BETWEEN '2022-01-01' AND '2022-12-31';
-- IN operator
SELECT * FROM students
WHERE major IN ('Computer Science', 'Data Science', 'Software Engineering');
-- Pattern matching
SELECT * FROM students
WHERE last_name LIKE 'Smith%';
-- NULL values
SELECT * FROM students
WHERE graduation_date IS NULL;
Sorting with ORDER BY
Sorts the result set.
-- Ascending order (default)
SELECT * FROM students ORDER BY last_name;
-- Descending order
SELECT * FROM students ORDER BY enrollment_date DESC;
-- Multiple sort columns
SELECT * FROM students ORDER BY major ASC, gpa DESC;
Grouping with GROUP BY
Groups rows that have the same values and often used with aggregate functions.
-- Count students by major
SELECT major, COUNT(*) AS student_count
FROM students
GROUP BY major;
-- Average GPA by enrollment year
SELECT enrollment_year, AVG(gpa) AS average_gpa
FROM students
GROUP BY enrollment_year;
Filtering Groups with HAVING
Filters the results of GROUP BY queries.
-- Majors with more than 100 students
SELECT major, COUNT(*) AS student_count
FROM students
GROUP BY major
HAVING COUNT(*) > 100;
-- Enrollment years with average GPA above 3.5
SELECT enrollment_year, AVG(gpa) AS average_gpa
FROM students
GROUP BY enrollment_year
HAVING AVG(gpa) > 3.5;
Joining Tables
Combines rows from two or more tables based on a related column.
-- Inner join
SELECT s.student_id, s.first_name, s.last_name, c.course_name
FROM students s
INNER JOIN enrollments e ON s.student_id = e.student_id
INNER JOIN courses c ON e.course_id = c.course_id;
-- Left join
SELECT s.student_id, s.first_name, s.last_name, e.course_id
FROM students s
LEFT JOIN enrollments e ON s.student_id = e.student_id;
-- Right join
SELECT c.course_id, c.course_name, p.first_name AS professor_first_name
FROM courses c
RIGHT JOIN professors p ON c.professor_id = p.professor_id;
-- Full join
SELECT s.student_id, s.first_name, s.last_name, e.course_id
FROM students s
FULL JOIN enrollments e ON s.student_id = e.student_id;
-- Self join
SELECT e1.employee_id, e1.first_name, e2.first_name AS manager_name
FROM employees e1
JOIN employees e2 ON e1.manager_id = e2.employee_id;
-- Cross join
SELECT s.student_id, c.course_id
FROM students s
CROSS JOIN courses c;
Subqueries
Queries nested within another query.
-- Subquery in WHERE clause
SELECT first_name, last_name
FROM students
WHERE student_id IN (
SELECT student_id
FROM enrollments
WHERE course_id = 'CS101'
);
-- Subquery in FROM clause
SELECT avg_gpa_by_major.major, avg_gpa_by_major.avg_gpa
FROM (
SELECT major, AVG(gpa) AS avg_gpa
FROM students
GROUP BY major
) AS avg_gpa_by_major
WHERE avg_gpa_by_major.avg_gpa > 3.5;
-- Correlated subquery
SELECT s.first_name, s.last_name
FROM students s
WHERE EXISTS (
SELECT 1
FROM enrollments e
WHERE e.student_id = s.student_id
AND e.course_id = 'CS101'
);
Set Operations
Combines the results of two or more SELECT statements.
-- UNION: combines results and removes duplicates
SELECT student_id FROM current_students
UNION
SELECT student_id FROM alumni;
-- UNION ALL: combines results and keeps duplicates
SELECT student_id FROM current_students
UNION ALL
SELECT student_id FROM alumni;
-- INTERSECT: returns only the rows that appear in both result sets
SELECT student_id FROM honors_students
INTERSECT
SELECT student_id FROM scholarship_recipients;
-- EXCEPT/MINUS: returns rows from the first query that don't appear in the second
SELECT student_id FROM enrolled_students
EXCEPT
SELECT student_id FROM graduated_students;
Common Table Expressions (CTEs)
Temporary named result sets that can be referenced within a SELECT, INSERT, UPDATE, or DELETE statement.
WITH student_course_counts AS (
SELECT student_id, COUNT(*) AS course_count
FROM enrollments
GROUP BY student_id
)
SELECT s.first_name, s.last_name, scc.course_count
FROM students s
JOIN student_course_counts scc ON s.student_id = scc.student_id
WHERE scc.course_count > 5;
-- Recursive CTE
WITH RECURSIVE employee_hierarchy AS (
-- Base case: top-level employees with no manager
SELECT employee_id, first_name, last_name, manager_id, 1 AS level
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive case: employees with managers
SELECT e.employee_id, e.first_name, e.last_name, e.manager_id, eh.level + 1
FROM employees e
JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
)
SELECT * FROM employee_hierarchy ORDER BY level, employee_id;
Window Functions
Perform calculations across a set of table rows related to the current row.
-- Row numbering
SELECT student_id, first_name, last_name, major,
ROW_NUMBER() OVER (PARTITION BY major ORDER BY gpa DESC) AS rank_in_major
FROM students;
-- Running total
SELECT enrollment_date, student_id, tuition_paid,
SUM(tuition_paid) OVER (ORDER BY enrollment_date) AS running_total
FROM tuition_payments;
-- Moving average
SELECT transaction_date, amount,
AVG(amount) OVER (
ORDER BY transaction_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS seven_day_avg
FROM daily_transactions;
-- Percentile
SELECT student_id, gpa,
NTILE(4) OVER (ORDER BY gpa DESC) AS quartile
FROM students;
2. INSERT
The INSERT statement adds new rows to a table.
Basic INSERT Statement
Inserts a single row with values for all columns in order.
INSERT INTO students
VALUES (1001, 'John', 'Doe', '2000-05-15', 'Computer Science', 3.75);
Specifying Columns
Inserts a single row with values for specified columns.
INSERT INTO students (student_id, first_name, last_name, major)
VALUES (1002, 'Jane', 'Smith', 'Data Science');
Multiple Row Insertion
Inserts multiple rows in a single statement.
INSERT INTO students (student_id, first_name, last_name, major)
VALUES
(1003, 'Alice', 'Johnson', 'Mathematics'),
(1004, 'Bob', 'Brown', 'Physics'),
(1005, 'Charlie', 'Davis', 'Chemistry');
INSERT with Subquery
Inserts rows based on a query result.
INSERT INTO alumni (student_id, first_name, last_name, graduation_date)
SELECT student_id, first_name, last_name, graduation_date
FROM students
WHERE graduation_date IS NOT NULL AND graduation_date <= CURRENT_DATE;
INSERT with DEFAULT Values
Inserts a row with default values.
INSERT INTO audit_log DEFAULT VALUES;
-- Or with some specified values
INSERT INTO students (student_id, first_name, last_name)
VALUES (1006, 'David', 'Wilson');
-- Other columns will use their default values
3. UPDATE
The UPDATE statement modifies existing rows in a table.
Basic UPDATE Statement
Updates all rows in a table.
UPDATE students SET gpa = 0.0;
Conditional UPDATE
Updates rows that meet specific conditions.
UPDATE students
SET major = 'Computer Science'
WHERE student_id = 1001;
-- Multiple columns
UPDATE students
SET major = 'Data Science', gpa = 3.8
WHERE student_id = 1002;
-- Arithmetic operations
UPDATE products
SET price = price * 1.1
WHERE category = 'Electronics';
UPDATE with Subquery
Updates rows based on a query result.
UPDATE students
SET status = 'Graduated'
WHERE student_id IN (
SELECT student_id
FROM enrollments
GROUP BY student_id
HAVING COUNT(DISTINCT course_id) >= 40
);
UPDATE with JOIN
Updates rows based on values from another table.
-- SQL Server, MySQL
UPDATE s
SET s.gpa = (s.gpa + e.course_gpa) / 2
FROM students s
JOIN extra_credits e ON s.student_id = e.student_id;
-- Oracle, PostgreSQL
UPDATE students s
SET gpa = (s.gpa + e.course_gpa) / 2
FROM extra_credits e
WHERE s.student_id = e.student_id;
CASE in UPDATE
Uses conditional logic in updates.
UPDATE students
SET status = CASE
WHEN credits >= 120 THEN 'Senior'
WHEN credits >= 90 THEN 'Junior'
WHEN credits >= 60 THEN 'Sophomore'
ELSE 'Freshman'
END;
4. DELETE
The DELETE statement removes rows from a table.
Basic DELETE Statement
Deletes all rows from a table.
DELETE FROM temporary_logs;
Conditional DELETE
Deletes rows that meet specific conditions.
DELETE FROM students
WHERE status = 'Withdrawn';
-- Multiple conditions
DELETE FROM enrollments
WHERE semester = 'Fall 2022' AND student_id = 1001;
DELETE with Subquery
Deletes rows based on a query result.
DELETE FROM students
WHERE student_id IN (
SELECT student_id
FROM academic_probation
WHERE violation_count > 3
);
DELETE with JOIN
Deletes rows based on a join condition.
-- SQL Server, MySQL
DELETE s
FROM students s
JOIN withdrawal_requests w ON s.student_id = w.student_id
WHERE w.status = 'Approved';
-- Oracle, PostgreSQL
DELETE FROM students
WHERE student_id IN (
SELECT s.student_id
FROM students s
JOIN withdrawal_requests w ON s.student_id = w.student_id
WHERE w.status = 'Approved'
);
5. MERGE
The MERGE statement (also known as UPSERT in some databases) performs INSERT, UPDATE, or DELETE operations in a single statement based on a condition.
-- SQL Server
MERGE INTO students AS target
USING incoming_students AS source
ON target.student_id = source.student_id
WHEN MATCHED THEN
UPDATE SET
target.first_name = source.first_name,
target.last_name = source.last_name,
target.major = source.major
WHEN NOT MATCHED THEN
INSERT (student_id, first_name, last_name, major)
VALUES (source.student_id, source.first_name, source.last_name, source.major);
-- PostgreSQL (using alternative syntax)
INSERT INTO students (student_id, first_name, last_name, major)
VALUES (1001, 'John', 'Doe', 'Computer Science')
ON CONFLICT (student_id) DO UPDATE
SET
first_name = EXCLUDED.first_name,
last_name = EXCLUDED.last_name,
major = EXCLUDED.major;
Transaction Control
Transactions group DML operations into logical units of work that can be committed or rolled back together.
Beginning a Transaction
-- SQL Server, PostgreSQL
BEGIN TRANSACTION;
-- Oracle
BEGIN;
-- MySQL
START TRANSACTION;
Committing a Transaction
Permanently saves changes to the database.
COMMIT;
Rolling Back a Transaction
Undoes changes made in the current transaction.
ROLLBACK;
Savepoints
Creates points within a transaction that can be rolled back to.
-- Create savepoint
SAVEPOINT before_delete;
-- Delete data
DELETE FROM students WHERE status = 'Inactive';
-- Roll back to savepoint
ROLLBACK TO SAVEPOINT before_delete;
Advanced DML Features
Batch Processing
Processes large amounts of data in smaller, manageable chunks.
-- Batched delete example
DECLARE @BatchSize INT = 1000;
DECLARE @RowsAffected INT = 1;
WHILE @RowsAffected > 0
BEGIN
DELETE TOP (@BatchSize) FROM large_log_table
WHERE log_date < '2022-01-01';
SET @RowsAffected = @@ROWCOUNT;
IF @RowsAffected > 0
COMMIT;
WAITFOR DELAY '00:00:01'; -- 1 second pause
END
RETURNING/OUTPUT Clause
Returns values from rows affected by DML operations.
-- PostgreSQL
INSERT INTO students (first_name, last_name, major)
VALUES ('Emma', 'Garcia', 'Physics')
RETURNING student_id, first_name, last_name;
-- SQL Server
INSERT INTO students (first_name, last_name, major)
OUTPUT INSERTED.student_id, INSERTED.first_name, INSERTED.last_name
VALUES ('Emma', 'Garcia', 'Physics');
-- Oracle
INSERT INTO students (first_name, last_name, major)
VALUES ('Emma', 'Garcia', 'Physics')
RETURNING student_id, first_name, last_name INTO :id, :first, :last;
Multi-Table Operations
Performs operations on multiple tables at once.
-- Multi-table INSERT (Oracle, PostgreSQL)
INSERT ALL
INTO current_students (student_id, status) VALUES (1001, 'Active')
INTO student_details (student_id, address) VALUES (1001, '123 College St')
INTO majors (student_id, major) VALUES (1001, 'Computer Science')
SELECT * FROM dual;
-- Multi-table UPDATE (MySQL)
UPDATE customers c, orders o
SET c.last_order_date = o.order_date,
o.status = 'Processed'
WHERE c.customer_id = o.customer_id
AND o.order_date = CURRENT_DATE;
Conditional Inserts
Inserts rows based on conditions.
-- Oracle
INSERT
WHEN enrollment_year >= 2023 THEN
INTO new_students (student_id, first_name, last_name)
VALUES (student_id, first_name, last_name)
WHEN enrollment_year < 2023 AND enrollment_year >= 2020 THEN
INTO current_students (student_id, first_name, last_name)
VALUES (student_id, first_name, last_name)
ELSE
INTO legacy_students (student_id, first_name, last_name)
VALUES (student_id, first_name, last_name)
SELECT student_id, first_name, last_name, enrollment_year
FROM incoming_students;
DML Best Practices
-
Use Transactions for Multi-Statement Operations
- Group related DML operations in transactions
- Ensure data consistency with proper error handling
BEGIN TRANSACTION; INSERT INTO orders (order_id, customer_id, order_date) VALUES (1001, 101, CURRENT_DATE); INSERT INTO order_items (order_id, product_id, quantity) VALUES (1001, 5001, 2), (1001, 5002, 1); UPDATE inventory SET stock = stock - 2 WHERE product_id = 5001; UPDATE inventory SET stock = stock - 1 WHERE product_id = 5002; COMMIT; -
Always Include WHERE Clauses for UPDATE and DELETE
- Prevent accidental modifications to all rows
- Consider adding a LIMIT/TOP clause for large operations
-- Bad practice (affects all rows) UPDATE students SET gpa = 4.0; -- Good practice (specific condition) UPDATE students SET gpa = 4.0 WHERE student_id = 1001; -
Use Parameterized Queries
- Prevent SQL injection attacks
- Improve performance through query plan caching
-- Bad practice (vulnerable to SQL injection) executeQuery("SELECT * FROM students WHERE student_id = " + studentId); -- Good practice (parameterized query) executeQuery("SELECT * FROM students WHERE student_id = ?", [studentId]); -
Batch Large Operations
- Process large datasets in smaller chunks
- Reduce transaction duration and lock contention
-- Process 1000 records at a time WHILE EXISTS (SELECT 1 FROM students WHERE processed = 0 LIMIT 1) BEGIN UPDATE students SET processed = 1 WHERE student_id IN ( SELECT student_id FROM students WHERE processed = 0 LIMIT 1000 ); COMMIT; END; -
Use Appropriate Indexes
- Ensure indexes exist for columns used in WHERE clauses
- Consider how DML operations affect indexes
-
Minimize the Use of Triggers for Business Logic
- Keep complex logic in application code when possible
- Use triggers primarily for data integrity and auditing
-
Consider Performance Impact of Joins
- Test complex joins with representative data volumes
- Monitor query execution plans for efficiency
-
Use MERGE for Upsert Operations
- Simplify insert-or-update logic
- Improve atomicity and performance
-
Validate Data Before DML Operations
- Implement application-level validation
- Use database constraints for data integrity
-
Monitor and Log DML Operations
- Track significant data changes
- Implement audit logging for sensitive operations
DML Examples for Common Database Tasks
Student Registration System
-- Register a new student (Transaction with multiple inserts)
BEGIN TRANSACTION;
INSERT INTO students (student_id, first_name, last_name, birth_date, enrollment_date)
VALUES (1001, 'John', 'Doe', '2000-05-15', CURRENT_DATE);
INSERT INTO student_contact (student_id, email, phone)
VALUES (1001, 'john.doe@example.edu', '555-123-4567');
INSERT INTO student_address (student_id, street, city, state, zip)
VALUES (1001, '123 Campus Dr', 'College Town', 'CA', '90210');
COMMIT;
-- Enroll a student in courses
INSERT INTO enrollments (student_id, course_id, semester, status)
VALUES
(1001, 'CS101', 'Fall 2023', 'Enrolled'),
(1001, 'MATH201', 'Fall 2023', 'Enrolled'),
(1001, 'ENG101', 'Fall 2023', 'Enrolled');
-- Update student information
UPDATE students
SET major = 'Computer Science', advisor_id = 101
WHERE student_id = 1001;
-- Record grades (with validation)
UPDATE enrollments
SET grade = CASE
WHEN final_score >= 90 THEN 'A'
WHEN final_score >= 80 THEN 'B'
WHEN final_score >= 70 THEN 'C'
WHEN final_score >= 60 THEN 'D'
ELSE 'F'
END
WHERE student_id = 1001 AND semester = 'Fall 2023';
-- Drop a course
DELETE FROM enrollments
WHERE student_id = 1001 AND course_id = 'ENG101' AND semester = 'Fall 2023';
-- Query for student transcript
SELECT s.student_id, s.first_name, s.last_name,
e.course_id, c.course_name, c.credits,
e.semester, e.grade
FROM students s
JOIN enrollments e ON s.student_id = e.student_id
JOIN courses c ON e.course_id = c.course_id
WHERE s.student_id = 1001
ORDER BY e.semester DESC, c.course_id;
-- Calculate GPA
SELECT student_id,
SUM(credits *
CASE
WHEN grade = 'A' THEN 4.0
WHEN grade = 'B' THEN 3.0
WHEN grade = 'C' THEN 2.0
WHEN grade = 'D' THEN 1.0
ELSE 0
END) / SUM(credits) AS gpa
FROM (
SELECT e.student_id, c.credits, e.grade
FROM enrollments e
JOIN courses c ON e.course_id = c.course_id
WHERE e.student_id = 1001 AND e.grade IS NOT NULL
) grade_points;
-- Graduate a student (multiple updates in transaction)
BEGIN TRANSACTION;
UPDATE students
SET status = 'Graduated', graduation_date = CURRENT_DATE
WHERE student_id = 1001 AND
(SELECT COUNT(*) FROM enrollments
WHERE student_id = 1001 AND grade IS NULL) = 0;
INSERT INTO alumni (student_id, graduation_date, degree)
SELECT student_id, graduation_date, major
FROM students
WHERE student_id = 1001 AND status = 'Graduated';
COMMIT;
E-commerce Order Processing
-- Create a new order (transaction with multiple operations)
BEGIN TRANSACTION;
-- Insert order header
INSERT INTO orders (order_id, customer_id, order_date, status)
VALUES (10001, 5001, CURRENT_TIMESTAMP, 'Pending');
-- Insert order lines
INSERT INTO order_items (order_id, product_id, quantity, unit_price)
SELECT 10001, product_id, quantity, current_price
FROM shopping_cart
WHERE customer_id = 5001;
-- Calculate order total
UPDATE orders
SET total_amount = (
SELECT SUM(quantity * unit_price)
FROM order_items
WHERE order_id = 10001
),
tax_amount = (
SELECT SUM(quantity * unit_price) * 0.08
FROM order_items
WHERE order_id = 10001
);
-- Update inventory
UPDATE products p
SET stock_quantity = p.stock_quantity - oi.quantity
FROM order_items oi
WHERE oi.order_id = 10001 AND p.product_id = oi.product_id;
-- Clear shopping cart
DELETE FROM shopping_cart
WHERE customer_id = 5001;
COMMIT;
-- Process payment
UPDATE orders
SET status = 'Paid', payment_date = CURRENT_TIMESTAMP
WHERE order_id = 10001;
-- Ship order
UPDATE orders
SET status = 'Shipped',
shipping_date = CURRENT_TIMESTAMP,
tracking_number = 'TRK' || order_id || 'XYZ'
WHERE order_id = 10001;
-- Update customer purchase history
INSERT INTO customer_activity (customer_id, activity_type, activity_date)
VALUES (5001, 'PURCHASE', CURRENT_TIMESTAMP);
UPDATE customers
SET
last_order_date = CURRENT_TIMESTAMP,
lifetime_value = lifetime_value + (
SELECT total_amount FROM orders WHERE order_id = 10001
)
WHERE customer_id = 5001;
-- Get sales report
SELECT
p.category,
SUM(oi.quantity) AS units_sold,
SUM(oi.quantity * oi.unit_price) AS total_sales
FROM order_items oi
JOIN products p ON oi.product_id = p.product_id
JOIN orders o ON oi.order_id = o.order_id
WHERE o.order_date >= DATEADD(day, -30, CURRENT_DATE)
GROUP BY p.category
ORDER BY total_sales DESC;
DML provides the tools needed to manipulate and interact with the data stored in a database. Understanding DML commands and their proper use is essential for developers to build efficient, secure, and reliable database operations in their applications.