DML

What is Data Manipulation Language (DML)?

Data Manipulation Language (DML) is a subset of SQL (Structured Query Language) used to manage data within database objects. While DDL deals with the structure of the database, DML is concerned with the actual data stored in the database. DML statements allow users to retrieve, insert, modify, and delete data in tables and other database objects.

DML operations are the most frequently used database operations in typical applications, forming the foundation of data interactions in database-driven systems.

Purpose of DML

DML commands serve several essential purposes in database management:

  1. Data Retrieval: Fetch specific data from database tables
  2. Data Modification: Insert new records, update existing records, or delete records
  3. Transaction Processing: Enable ACID transactions through commit and rollback operations
  4. Data Analysis: Support complex queries for data analysis and reporting
  5. Application Integration: Provide standard interfaces for applications to interact with databases

Key DML Commands

1. SELECT

The SELECT statement retrieves data from one or more tables.

Basic SELECT Statement

Retrieves all columns from a table.

SELECT * FROM students;

Column Selection

Retrieves specific columns from a table.

SELECT student_id, first_name, last_name FROM students;

Filtering with WHERE

Retrieves rows that meet specific conditions.

SELECT * FROM students WHERE major = 'Computer Science';

-- Multiple conditions
SELECT * FROM students 
WHERE major = 'Computer Science' AND enrollment_year = 2023;

-- Range conditions
SELECT * FROM students 
WHERE enrollment_date BETWEEN '2022-01-01' AND '2022-12-31';

-- IN operator
SELECT * FROM students 
WHERE major IN ('Computer Science', 'Data Science', 'Software Engineering');

-- Pattern matching
SELECT * FROM students 
WHERE last_name LIKE 'Smith%';

-- NULL values
SELECT * FROM students 
WHERE graduation_date IS NULL;

Sorting with ORDER BY

Sorts the result set.

-- Ascending order (default)
SELECT * FROM students ORDER BY last_name;

-- Descending order
SELECT * FROM students ORDER BY enrollment_date DESC;

-- Multiple sort columns
SELECT * FROM students ORDER BY major ASC, gpa DESC;

Grouping with GROUP BY

Groups rows that have the same values and often used with aggregate functions.

-- Count students by major
SELECT major, COUNT(*) AS student_count 
FROM students 
GROUP BY major;

-- Average GPA by enrollment year
SELECT enrollment_year, AVG(gpa) AS average_gpa 
FROM students 
GROUP BY enrollment_year;

Filtering Groups with HAVING

Filters the results of GROUP BY queries.

-- Majors with more than 100 students
SELECT major, COUNT(*) AS student_count 
FROM students 
GROUP BY major 
HAVING COUNT(*) > 100;

-- Enrollment years with average GPA above 3.5
SELECT enrollment_year, AVG(gpa) AS average_gpa 
FROM students 
GROUP BY enrollment_year 
HAVING AVG(gpa) > 3.5;

Joining Tables

Combines rows from two or more tables based on a related column.

-- Inner join
SELECT s.student_id, s.first_name, s.last_name, c.course_name
FROM students s
INNER JOIN enrollments e ON s.student_id = e.student_id
INNER JOIN courses c ON e.course_id = c.course_id;

-- Left join
SELECT s.student_id, s.first_name, s.last_name, e.course_id
FROM students s
LEFT JOIN enrollments e ON s.student_id = e.student_id;

-- Right join
SELECT c.course_id, c.course_name, p.first_name AS professor_first_name
FROM courses c
RIGHT JOIN professors p ON c.professor_id = p.professor_id;

-- Full join
SELECT s.student_id, s.first_name, s.last_name, e.course_id
FROM students s
FULL JOIN enrollments e ON s.student_id = e.student_id;

-- Self join
SELECT e1.employee_id, e1.first_name, e2.first_name AS manager_name
FROM employees e1
JOIN employees e2 ON e1.manager_id = e2.employee_id;

-- Cross join
SELECT s.student_id, c.course_id
FROM students s
CROSS JOIN courses c;

Subqueries

Queries nested within another query.

-- Subquery in WHERE clause
SELECT first_name, last_name
FROM students
WHERE student_id IN (
    SELECT student_id
    FROM enrollments
    WHERE course_id = 'CS101'
);

-- Subquery in FROM clause
SELECT avg_gpa_by_major.major, avg_gpa_by_major.avg_gpa
FROM (
    SELECT major, AVG(gpa) AS avg_gpa
    FROM students
    GROUP BY major
) AS avg_gpa_by_major
WHERE avg_gpa_by_major.avg_gpa > 3.5;

-- Correlated subquery
SELECT s.first_name, s.last_name
FROM students s
WHERE EXISTS (
    SELECT 1
    FROM enrollments e
    WHERE e.student_id = s.student_id
    AND e.course_id = 'CS101'
);

Set Operations

Combines the results of two or more SELECT statements.

-- UNION: combines results and removes duplicates
SELECT student_id FROM current_students
UNION
SELECT student_id FROM alumni;

-- UNION ALL: combines results and keeps duplicates
SELECT student_id FROM current_students
UNION ALL
SELECT student_id FROM alumni;

-- INTERSECT: returns only the rows that appear in both result sets
SELECT student_id FROM honors_students
INTERSECT
SELECT student_id FROM scholarship_recipients;

-- EXCEPT/MINUS: returns rows from the first query that don't appear in the second
SELECT student_id FROM enrolled_students
EXCEPT
SELECT student_id FROM graduated_students;

Common Table Expressions (CTEs)

Temporary named result sets that can be referenced within a SELECT, INSERT, UPDATE, or DELETE statement.

WITH student_course_counts AS (
    SELECT student_id, COUNT(*) AS course_count
    FROM enrollments
    GROUP BY student_id
)
SELECT s.first_name, s.last_name, scc.course_count
FROM students s
JOIN student_course_counts scc ON s.student_id = scc.student_id
WHERE scc.course_count > 5;

-- Recursive CTE
WITH RECURSIVE employee_hierarchy AS (
    -- Base case: top-level employees with no manager
    SELECT employee_id, first_name, last_name, manager_id, 1 AS level
    FROM employees
    WHERE manager_id IS NULL
    
    UNION ALL
    
    -- Recursive case: employees with managers
    SELECT e.employee_id, e.first_name, e.last_name, e.manager_id, eh.level + 1
    FROM employees e
    JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
)
SELECT * FROM employee_hierarchy ORDER BY level, employee_id;

Window Functions

Perform calculations across a set of table rows related to the current row.

-- Row numbering
SELECT student_id, first_name, last_name, major,
       ROW_NUMBER() OVER (PARTITION BY major ORDER BY gpa DESC) AS rank_in_major
FROM students;

-- Running total
SELECT enrollment_date, student_id, tuition_paid,
       SUM(tuition_paid) OVER (ORDER BY enrollment_date) AS running_total
FROM tuition_payments;

-- Moving average
SELECT transaction_date, amount,
       AVG(amount) OVER (
           ORDER BY transaction_date
           ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
       ) AS seven_day_avg
FROM daily_transactions;

-- Percentile
SELECT student_id, gpa,
       NTILE(4) OVER (ORDER BY gpa DESC) AS quartile
FROM students;

2. INSERT

The INSERT statement adds new rows to a table.

Basic INSERT Statement

Inserts a single row with values for all columns in order.

INSERT INTO students
VALUES (1001, 'John', 'Doe', '2000-05-15', 'Computer Science', 3.75);

Specifying Columns

Inserts a single row with values for specified columns.

INSERT INTO students (student_id, first_name, last_name, major)
VALUES (1002, 'Jane', 'Smith', 'Data Science');

Multiple Row Insertion

Inserts multiple rows in a single statement.

INSERT INTO students (student_id, first_name, last_name, major)
VALUES 
    (1003, 'Alice', 'Johnson', 'Mathematics'),
    (1004, 'Bob', 'Brown', 'Physics'),
    (1005, 'Charlie', 'Davis', 'Chemistry');

INSERT with Subquery

Inserts rows based on a query result.

INSERT INTO alumni (student_id, first_name, last_name, graduation_date)
SELECT student_id, first_name, last_name, graduation_date
FROM students
WHERE graduation_date IS NOT NULL AND graduation_date <= CURRENT_DATE;

INSERT with DEFAULT Values

Inserts a row with default values.

INSERT INTO audit_log DEFAULT VALUES;

-- Or with some specified values
INSERT INTO students (student_id, first_name, last_name)
VALUES (1006, 'David', 'Wilson');
-- Other columns will use their default values

3. UPDATE

The UPDATE statement modifies existing rows in a table.

Basic UPDATE Statement

Updates all rows in a table.

UPDATE students SET gpa = 0.0;

Conditional UPDATE

Updates rows that meet specific conditions.

UPDATE students
SET major = 'Computer Science'
WHERE student_id = 1001;

-- Multiple columns
UPDATE students
SET major = 'Data Science', gpa = 3.8
WHERE student_id = 1002;

-- Arithmetic operations
UPDATE products
SET price = price * 1.1
WHERE category = 'Electronics';

UPDATE with Subquery

Updates rows based on a query result.

UPDATE students
SET status = 'Graduated'
WHERE student_id IN (
    SELECT student_id
    FROM enrollments
    GROUP BY student_id
    HAVING COUNT(DISTINCT course_id) >= 40
);

UPDATE with JOIN

Updates rows based on values from another table.

-- SQL Server, MySQL
UPDATE s
SET s.gpa = (s.gpa + e.course_gpa) / 2
FROM students s
JOIN extra_credits e ON s.student_id = e.student_id;

-- Oracle, PostgreSQL
UPDATE students s
SET gpa = (s.gpa + e.course_gpa) / 2
FROM extra_credits e
WHERE s.student_id = e.student_id;

CASE in UPDATE

Uses conditional logic in updates.

UPDATE students
SET status = CASE
    WHEN credits >= 120 THEN 'Senior'
    WHEN credits >= 90 THEN 'Junior'
    WHEN credits >= 60 THEN 'Sophomore'
    ELSE 'Freshman'
END;

4. DELETE

The DELETE statement removes rows from a table.

Basic DELETE Statement

Deletes all rows from a table.

DELETE FROM temporary_logs;

Conditional DELETE

Deletes rows that meet specific conditions.

DELETE FROM students
WHERE status = 'Withdrawn';

-- Multiple conditions
DELETE FROM enrollments
WHERE semester = 'Fall 2022' AND student_id = 1001;

DELETE with Subquery

Deletes rows based on a query result.

DELETE FROM students
WHERE student_id IN (
    SELECT student_id
    FROM academic_probation
    WHERE violation_count > 3
);

DELETE with JOIN

Deletes rows based on a join condition.

-- SQL Server, MySQL
DELETE s
FROM students s
JOIN withdrawal_requests w ON s.student_id = w.student_id
WHERE w.status = 'Approved';

-- Oracle, PostgreSQL
DELETE FROM students
WHERE student_id IN (
    SELECT s.student_id
    FROM students s
    JOIN withdrawal_requests w ON s.student_id = w.student_id
    WHERE w.status = 'Approved'
);

5. MERGE

The MERGE statement (also known as UPSERT in some databases) performs INSERT, UPDATE, or DELETE operations in a single statement based on a condition.

-- SQL Server
MERGE INTO students AS target
USING incoming_students AS source
ON target.student_id = source.student_id
WHEN MATCHED THEN
    UPDATE SET
        target.first_name = source.first_name,
        target.last_name = source.last_name,
        target.major = source.major
WHEN NOT MATCHED THEN
    INSERT (student_id, first_name, last_name, major)
    VALUES (source.student_id, source.first_name, source.last_name, source.major);

-- PostgreSQL (using alternative syntax)
INSERT INTO students (student_id, first_name, last_name, major)
VALUES (1001, 'John', 'Doe', 'Computer Science')
ON CONFLICT (student_id) DO UPDATE
SET
    first_name = EXCLUDED.first_name,
    last_name = EXCLUDED.last_name,
    major = EXCLUDED.major;

Transaction Control

Transactions group DML operations into logical units of work that can be committed or rolled back together.

Beginning a Transaction

-- SQL Server, PostgreSQL
BEGIN TRANSACTION;

-- Oracle
BEGIN;

-- MySQL
START TRANSACTION;

Committing a Transaction

Permanently saves changes to the database.

COMMIT;

Rolling Back a Transaction

Undoes changes made in the current transaction.

ROLLBACK;

Savepoints

Creates points within a transaction that can be rolled back to.

-- Create savepoint
SAVEPOINT before_delete;

-- Delete data
DELETE FROM students WHERE status = 'Inactive';

-- Roll back to savepoint
ROLLBACK TO SAVEPOINT before_delete;

Advanced DML Features

Batch Processing

Processes large amounts of data in smaller, manageable chunks.

-- Batched delete example
DECLARE @BatchSize INT = 1000;
DECLARE @RowsAffected INT = 1;

WHILE @RowsAffected > 0
BEGIN
    DELETE TOP (@BatchSize) FROM large_log_table
    WHERE log_date < '2022-01-01';
    
    SET @RowsAffected = @@ROWCOUNT;
    
    IF @RowsAffected > 0
        COMMIT;
        WAITFOR DELAY '00:00:01'; -- 1 second pause
END

RETURNING/OUTPUT Clause

Returns values from rows affected by DML operations.

-- PostgreSQL
INSERT INTO students (first_name, last_name, major)
VALUES ('Emma', 'Garcia', 'Physics')
RETURNING student_id, first_name, last_name;

-- SQL Server
INSERT INTO students (first_name, last_name, major)
OUTPUT INSERTED.student_id, INSERTED.first_name, INSERTED.last_name
VALUES ('Emma', 'Garcia', 'Physics');

-- Oracle
INSERT INTO students (first_name, last_name, major)
VALUES ('Emma', 'Garcia', 'Physics')
RETURNING student_id, first_name, last_name INTO :id, :first, :last;

Multi-Table Operations

Performs operations on multiple tables at once.

-- Multi-table INSERT (Oracle, PostgreSQL)
INSERT ALL
    INTO current_students (student_id, status) VALUES (1001, 'Active')
    INTO student_details (student_id, address) VALUES (1001, '123 College St')
    INTO majors (student_id, major) VALUES (1001, 'Computer Science')
SELECT * FROM dual;

-- Multi-table UPDATE (MySQL)
UPDATE customers c, orders o
SET c.last_order_date = o.order_date,
    o.status = 'Processed'
WHERE c.customer_id = o.customer_id
AND o.order_date = CURRENT_DATE;

Conditional Inserts

Inserts rows based on conditions.

-- Oracle
INSERT
    WHEN enrollment_year >= 2023 THEN
        INTO new_students (student_id, first_name, last_name)
        VALUES (student_id, first_name, last_name)
    WHEN enrollment_year < 2023 AND enrollment_year >= 2020 THEN
        INTO current_students (student_id, first_name, last_name)
        VALUES (student_id, first_name, last_name)
    ELSE
        INTO legacy_students (student_id, first_name, last_name)
        VALUES (student_id, first_name, last_name)
SELECT student_id, first_name, last_name, enrollment_year
FROM incoming_students;

DML Best Practices

  1. Use Transactions for Multi-Statement Operations

    • Group related DML operations in transactions
    • Ensure data consistency with proper error handling
    BEGIN TRANSACTION;
    
    INSERT INTO orders (order_id, customer_id, order_date)
    VALUES (1001, 101, CURRENT_DATE);
    
    INSERT INTO order_items (order_id, product_id, quantity)
    VALUES (1001, 5001, 2), (1001, 5002, 1);
    
    UPDATE inventory SET stock = stock - 2 WHERE product_id = 5001;
    UPDATE inventory SET stock = stock - 1 WHERE product_id = 5002;
    
    COMMIT;
  2. Always Include WHERE Clauses for UPDATE and DELETE

    • Prevent accidental modifications to all rows
    • Consider adding a LIMIT/TOP clause for large operations
    -- Bad practice (affects all rows)
    UPDATE students SET gpa = 4.0;
    
    -- Good practice (specific condition)
    UPDATE students SET gpa = 4.0 WHERE student_id = 1001;
  3. Use Parameterized Queries

    • Prevent SQL injection attacks
    • Improve performance through query plan caching
    -- Bad practice (vulnerable to SQL injection)
    executeQuery("SELECT * FROM students WHERE student_id = " + studentId);
    
    -- Good practice (parameterized query)
    executeQuery("SELECT * FROM students WHERE student_id = ?", [studentId]);
  4. Batch Large Operations

    • Process large datasets in smaller chunks
    • Reduce transaction duration and lock contention
    -- Process 1000 records at a time
    WHILE EXISTS (SELECT 1 FROM students WHERE processed = 0 LIMIT 1)
    BEGIN
        UPDATE students
        SET processed = 1
        WHERE student_id IN (
            SELECT student_id FROM students
            WHERE processed = 0
            LIMIT 1000
        );
        
        COMMIT;
    END;
  5. Use Appropriate Indexes

    • Ensure indexes exist for columns used in WHERE clauses
    • Consider how DML operations affect indexes
  6. Minimize the Use of Triggers for Business Logic

    • Keep complex logic in application code when possible
    • Use triggers primarily for data integrity and auditing
  7. Consider Performance Impact of Joins

    • Test complex joins with representative data volumes
    • Monitor query execution plans for efficiency
  8. Use MERGE for Upsert Operations

    • Simplify insert-or-update logic
    • Improve atomicity and performance
  9. Validate Data Before DML Operations

    • Implement application-level validation
    • Use database constraints for data integrity
  10. Monitor and Log DML Operations

    • Track significant data changes
    • Implement audit logging for sensitive operations

DML Examples for Common Database Tasks

Student Registration System

-- Register a new student (Transaction with multiple inserts)
BEGIN TRANSACTION;

INSERT INTO students (student_id, first_name, last_name, birth_date, enrollment_date)
VALUES (1001, 'John', 'Doe', '2000-05-15', CURRENT_DATE);

INSERT INTO student_contact (student_id, email, phone)
VALUES (1001, 'john.doe@example.edu', '555-123-4567');

INSERT INTO student_address (student_id, street, city, state, zip)
VALUES (1001, '123 Campus Dr', 'College Town', 'CA', '90210');

COMMIT;

-- Enroll a student in courses
INSERT INTO enrollments (student_id, course_id, semester, status)
VALUES
    (1001, 'CS101', 'Fall 2023', 'Enrolled'),
    (1001, 'MATH201', 'Fall 2023', 'Enrolled'),
    (1001, 'ENG101', 'Fall 2023', 'Enrolled');

-- Update student information
UPDATE students
SET major = 'Computer Science', advisor_id = 101
WHERE student_id = 1001;

-- Record grades (with validation)
UPDATE enrollments
SET grade = CASE
    WHEN final_score >= 90 THEN 'A'
    WHEN final_score >= 80 THEN 'B'
    WHEN final_score >= 70 THEN 'C'
    WHEN final_score >= 60 THEN 'D'
    ELSE 'F'
END
WHERE student_id = 1001 AND semester = 'Fall 2023';

-- Drop a course
DELETE FROM enrollments
WHERE student_id = 1001 AND course_id = 'ENG101' AND semester = 'Fall 2023';

-- Query for student transcript
SELECT s.student_id, s.first_name, s.last_name,
       e.course_id, c.course_name, c.credits,
       e.semester, e.grade
FROM students s
JOIN enrollments e ON s.student_id = e.student_id
JOIN courses c ON e.course_id = c.course_id
WHERE s.student_id = 1001
ORDER BY e.semester DESC, c.course_id;

-- Calculate GPA
SELECT student_id, 
       SUM(credits * 
           CASE 
               WHEN grade = 'A' THEN 4.0
               WHEN grade = 'B' THEN 3.0
               WHEN grade = 'C' THEN 2.0
               WHEN grade = 'D' THEN 1.0
               ELSE 0
           END) / SUM(credits) AS gpa
FROM (
    SELECT e.student_id, c.credits, e.grade
    FROM enrollments e
    JOIN courses c ON e.course_id = c.course_id
    WHERE e.student_id = 1001 AND e.grade IS NOT NULL
) grade_points;

-- Graduate a student (multiple updates in transaction)
BEGIN TRANSACTION;

UPDATE students
SET status = 'Graduated', graduation_date = CURRENT_DATE
WHERE student_id = 1001 AND 
      (SELECT COUNT(*) FROM enrollments 
       WHERE student_id = 1001 AND grade IS NULL) = 0;

INSERT INTO alumni (student_id, graduation_date, degree)
SELECT student_id, graduation_date, major
FROM students
WHERE student_id = 1001 AND status = 'Graduated';

COMMIT;

E-commerce Order Processing

-- Create a new order (transaction with multiple operations)
BEGIN TRANSACTION;

-- Insert order header
INSERT INTO orders (order_id, customer_id, order_date, status)
VALUES (10001, 5001, CURRENT_TIMESTAMP, 'Pending');

-- Insert order lines
INSERT INTO order_items (order_id, product_id, quantity, unit_price)
SELECT 10001, product_id, quantity, current_price
FROM shopping_cart
WHERE customer_id = 5001;

-- Calculate order total
UPDATE orders
SET total_amount = (
    SELECT SUM(quantity * unit_price)
    FROM order_items
    WHERE order_id = 10001
),
    tax_amount = (
    SELECT SUM(quantity * unit_price) * 0.08
    FROM order_items
    WHERE order_id = 10001
);

-- Update inventory
UPDATE products p
SET stock_quantity = p.stock_quantity - oi.quantity
FROM order_items oi
WHERE oi.order_id = 10001 AND p.product_id = oi.product_id;

-- Clear shopping cart
DELETE FROM shopping_cart
WHERE customer_id = 5001;

COMMIT;

-- Process payment
UPDATE orders
SET status = 'Paid', payment_date = CURRENT_TIMESTAMP
WHERE order_id = 10001;

-- Ship order
UPDATE orders
SET status = 'Shipped', 
    shipping_date = CURRENT_TIMESTAMP,
    tracking_number = 'TRK' || order_id || 'XYZ'
WHERE order_id = 10001;

-- Update customer purchase history
INSERT INTO customer_activity (customer_id, activity_type, activity_date)
VALUES (5001, 'PURCHASE', CURRENT_TIMESTAMP);

UPDATE customers
SET 
    last_order_date = CURRENT_TIMESTAMP,
    lifetime_value = lifetime_value + (
        SELECT total_amount FROM orders WHERE order_id = 10001
    )
WHERE customer_id = 5001;

-- Get sales report
SELECT 
    p.category,
    SUM(oi.quantity) AS units_sold,
    SUM(oi.quantity * oi.unit_price) AS total_sales
FROM order_items oi
JOIN products p ON oi.product_id = p.product_id
JOIN orders o ON oi.order_id = o.order_id
WHERE o.order_date >= DATEADD(day, -30, CURRENT_DATE)
GROUP BY p.category
ORDER BY total_sales DESC;

DML provides the tools needed to manipulate and interact with the data stored in a database. Understanding DML commands and their proper use is essential for developers to build efficient, secure, and reliable database operations in their applications.