What are Attributes?
Attributes are properties or characteristics that describe entities in a database. They represent the specific pieces of data that we want to store about each entity and form the basis for columns in database tables.
In simple terms, attributes are the details we need to know about the things (entities) we’re tracking in our database. For example, if we have a “Student” entity, attributes might include name, ID number, date of birth, and address.
Purpose of Attributes
Attributes serve several important purposes in database design:
- Data Storage: Define what specific information needs to be stored
- Entity Description: Provide detailed characteristics of entities
- Differentiation: Help distinguish one entity instance from another
- Search Criteria: Allow for querying and retrieving specific entities
- Business Rules: Capture constraints and validation rules
- Relationships: Sometimes used to establish connections between entities
Types of Attributes
Attributes can be classified into several types based on their characteristics:
1. Simple vs. Composite Attributes
Simple Attribute
- Cannot be divided further into smaller components
- Represents a single, atomic value
- Examples: Age, Gender, Employee ID, Price
Composite Attribute
- Can be divided into smaller constituent parts
- Made up of multiple simple attributes
- Represented in ER diagrams as an attribute with connected sub-attributes
- Examples:
- Address (consists of Street, City, State, ZIP)
- Name (consists of First Name, Middle Name, Last Name)
- Date (consists of Day, Month, Year)
2. Single-valued vs. Multi-valued Attributes
Single-valued Attribute
- Can hold only one value for each entity instance
- The most common type of attribute
- Examples: Date of Birth, Social Security Number, Employee ID
Multi-valued Attribute
- Can hold multiple values for the same entity instance
- Represented with a double oval in ER diagrams
- Usually implemented as separate tables in relational databases
- Examples:
- Phone Numbers (a person may have home, work, and mobile numbers)
- Skills (an employee may have multiple skills)
- Email Addresses (a user may have multiple email addresses)
3. Stored vs. Derived Attributes
Stored Attribute
- Physically stored in the database
- Basic attribute whose value is directly entered and maintained
- Examples: Date of Birth, Name, Salary
Derived Attribute
- Value calculated or derived from other attributes
- Not physically stored but computed when needed
- Represented with a dashed oval in ER diagrams
- Examples:
- Age (derived from Date of Birth and current date)
- Total Price (derived from Quantity and Unit Price)
- Full Name (derived from First Name and Last Name)
- Years of Service (derived from Hire Date)
4. Key vs. Non-key Attributes
Key Attribute
- Uniquely identifies each entity instance within an entity set
- Used as a primary key in the corresponding database table
- Underlined in ER diagrams
- Examples: Student ID, Employee ID, Product Code, Social Security Number
Non-key Attribute
- Does not uniquely identify an entity
- Regular descriptive attribute
- Examples: Name, Address, Color, Size
5. Required vs. Optional Attributes
Required Attribute
- Must have a value for every entity instance
- Cannot be null
- Examples: Employee ID, Student Name, Product Code
Optional Attribute
- May have no value for some entity instances
- Can be null
- Examples: Middle Name, Fax Number, Secondary Email
6. Domain-Specific Attributes
Domain-Specific Attribute
- Has a specific defined domain or range of allowable values
- Often implemented with check constraints in databases
- Examples:
- Gender (domain: ‘Male’, ‘Female’, ‘Non-binary’)
- Status (domain: ‘Active’, ‘Inactive’, ‘Pending’)
- Grade (domain: ‘A’, ‘B’, ‘C’, ‘D’, ‘F’)
Attribute Domains
The domain of an attribute defines the set of allowable values that the attribute can take. It includes:
- Data Type: The kind of data the attribute can store (e.g., integer, string, date)
- Range or List: The specific values or range of values that are valid
- Format: Any specific format requirements (e.g., phone number format)
- Default Value: The value assigned if none is specified
- Constraints: Rules that restrict the possible values
Examples of Domains:
- Age: Integer values between 0 and 150
- Email: String following a valid email format
- US Zip Code: 5-digit or 9-digit format
- Credit Card Type: One of ‘Visa’, ‘MasterCard’, ‘American Express’, ‘Discover’
Representation of Attributes in ER Diagrams
Attributes are represented differently in various ER notation styles:
Chen Notation
- Attributes shown as ovals/ellipses connected to entity rectangles
- Attribute name written inside the oval
- Key attributes are underlined
- Multi-valued attributes use double ovals
- Derived attributes use dashed ovals
- Composite attributes have their component attributes connected to them
Crow’s Foot Notation
- Attributes listed inside the entity rectangle
- Primary key attributes often listed at the top, sometimes marked with PK
- Multi-valued attributes often not explicitly shown (implemented as separate tables)
- Derived attributes sometimes marked with an asterisk (*) or shown in italics
IDEF1X Notation
- Key attributes listed above a horizontal line in the entity box
- Non-key attributes listed below the line
- Foreign keys often marked with FK
- Derived attributes typically not shown
Determining Appropriate Attributes
When designing a database, consider these guidelines for selecting appropriate attributes:
1. Relevance
- Only include attributes that are relevant to the purpose of the database
- Ask: “Will this information be needed for reporting or decision-making?“
2. Atomicity
- Break down attributes to their most basic components when appropriate
- Consider how the data will be used to determine level of detail
3. Redundancy
- Avoid storing the same information in multiple places
- Use derived attributes for data that can be calculated
4. Storage Efficiency
- Consider the storage implications of different attribute choices
- Balance between storage efficiency and query performance
5. Future Requirements
- Consider potential future needs when defining attributes
- Allow for flexibility where appropriate
Attribute Implementation in Relational Databases
When implementing attributes in a relational database:
1. Simple Attributes
- Become columns in the corresponding table
- Data type is determined by the attribute’s domain
2. Composite Attributes
- Can be implemented in multiple ways:
- As separate columns for each component (e.g., street, city, state)
- As a single column for the entire composite attribute
- As separate columns plus a calculated column for the full value
3. Multi-valued Attributes
- Typically implemented as a separate table
- The new table includes:
- The primary key of the main entity
- The multi-valued attribute
- A primary key that combines both columns
4. Derived Attributes
- Can be implemented as:
- Virtual columns or computed columns in databases that support them
- Views
- Application logic
- Stored procedures or functions
- Triggers that maintain calculated values
5. Key Attributes
- Implemented as primary key columns
- Often indexed for performance
- May be surrogate keys (system-generated) or natural keys (based on business data)
Common Attribute Design Issues
1. Overloaded Attributes
- Problem: Using one attribute to store multiple types of information
- Example: Using “Phone” to store both home and work numbers
- Solution: Create separate attributes for each type of information
2. Splitting Attributes Inappropriately
- Problem: Breaking down attributes too far
- Example: Splitting “Name” into too many components (prefix, first, middle, last, suffix)
- Solution: Balance atomicity with practicality based on how the data will be used
3. Encoding Information in Attributes
- Problem: Embedding multiple pieces of information in a single value
- Example: Employee ID “IT-1067-NY” (department, number, location)
- Solution: Separate the encoded information into distinct attributes
4. Inconsistent Data Types
- Problem: Using different data types for similar attributes across entities
- Example: “Price” as decimal in one entity and integer in another
- Solution: Standardize data types for conceptually similar attributes
5. Naming Inconsistencies
- Problem: Using different names for the same concept
- Example: “Customer_ID” in one entity and “Cust_Num” in another
- Solution: Adopt and enforce naming conventions
Best Practices for Attribute Design
-
Use Descriptive Names:
- Choose clear, self-explanatory names
- Follow consistent naming conventions
- Avoid abbreviations unless they are standard and well-understood
-
Define Domains Clearly:
- Specify data types, lengths, and constraints
- Document allowed values and formats
- Include validation rules
-
Consider Performance:
- Understand how attribute choices affect query performance
- Choose appropriate data types for efficiency
- Consider indexing needs for attributes used in searches
-
Plan for Nulls:
- Decide which attributes can have null values
- Consider using default values instead of nulls where appropriate
- Understand the implications of nulls for queries and constraints
-
Document Attributes:
- Maintain a data dictionary with attribute definitions
- Include business rules and validation criteria
- Document the source of derived attributes
-
Balance Flexibility and Control:
- Design attributes to accommodate future needs
- Use constraints to maintain data integrity
- Consider using enumerated types for limited value sets
Attributes form the fundamental building blocks of database design, defining exactly what information will be stored about each entity. Careful attention to attribute design ensures that a database can efficiently and accurately represent the real-world objects and concepts it is intended to model.