Entities and Attributes
Entities and attributes form the foundational building blocks of any data model—understanding how to identify, define, and represent entities and their characteristics is critical for designing robust databases that accurately capture business requirements—this practical topic explores entity recognition, attribute types, and design principles that ensure your data models are clear, complete, and maintainable.
What is an Entity?
Definition
An entity is a thing, person, place, event, or concept that is important to the business and about which we want to store data. Entities are the primary objects in a data model.
Characteristics of Entities
Examples of Entities
Business Domain: Indian E-commerce
CUSTOMER (Entity)
– Represents each customer ( Grahak )
– Multiple instances (lakh customers)
– Important to business (revenue, relationships)
PRODUCT (Entity)
– Represents inventory items ( Utpadan )
– Multiple instances (product catalog)
– Important to business (sales, revenue)
ORDER (Entity)
– Represents customer purchases ( Aadessh )
– Multiple instances (thousands of orders)
– Important to business (revenue, fulfillment)
WAREHOUSE (Entity)
– Represents storage locations across India
– Multiple instances (Delhi, Mumbai, Bangalore)
– Important to business (inventory management)
Business Domain: Banking
ACCOUNT (Entity)
– Savings, Current, Recurring Deposit accounts
– Multiple instances per person
– Critical to business
TRANSACTION (Entity)
– Deposits, Withdrawals, Transfers
– Millions of instances daily
– Core to operations
CUSTOMER (Entity)
– Persons and corporations
– Thousands of instances
– Business relationship
BRANCH (Entity)
– Physical bank branches across India
– Multiple instances (urban and rural)
– Service delivery point
ATM_LOCATION (Entity)
– ATM booth locations
– Multiple instances per city
– Service access point
Business Domain: Indian Insurance
POLICY (Entity)
– Health, Life, Motor, Home insurance
– Lakhs of instances
– Core product
CLAIM (Entity)
– Insurance claims filed
– Thousands per month
– Expense management
POLICYHOLDER (Entity)
– Persons and organizations
– Multiple instances
– Customer
AGENT (Entity)
– Insurance agents across India
– Thousands of agents
– Sales channel
What is an Attribute?
Definition
An attribute is a characteristic or property of an entity. It describes a specific detail about that entity.
Components of an Attribute
Attribute Types
1. Simple vs Composite Attributes
Simple Attribute:
CustomerAge : 28
Temperature: 72.5
IsActive : true
Composite Attribute:
Address (India):
– Street: "MG Road"
– City: "Bangalore"
– State: "Karnataka"
– PinCode : "560001"
– Country: "India"
CustomerName :
– FirstName: "Rajesh"
– MiddleName : "Kumar"
– LastName: "Sharma"
– Salutation: "Mr."
PhoneNumber :
– CountryCode : "+91"
– AreaCode : "11" (Delhi)
– LocalNumber : "41234567"
Design Decision: When to Decompose?
Decompose if:
Keep Combined if:
2. Single-Valued vs Multi-Valued Attributes
Single-Valued Attribute:
CustomerAadhaarNumber : "123456789012" ( one Aadhaar per person)
PermanentAddress : "123 MG Road, Bangalore " ( one permanent address)
PanNumber : "ABCDE1234F" ( one PAN per person)
Multi-Valued Attribute:
CustomerPhoneNumbers (Indian context):
– Mobile: "9876543210"
– Mobile: "8765432109"
– Landline: "01141234567" (Delhi number)
CustomerEmails :
– "rajesh@work.com"
– "rajesh@personal.com"
CustomerAddresses :
– ResidentialAddress (Ghar ka Pata)
– OfficeAddress (Daftari Pata)
– HeadquartersAddress (Aam Pata)
Handling Multi-Valued Attributes in Relational Design
Option 1: Create Separate Table (Recommended)
CUSTOMER
– CustomerID (1)
– CustomerName
CUSTOMER_PHONE (Many)
– CustomerID
– PhoneNumber
– PhoneType (Mobile, Landline)
Relationship: 1 Customer has Many Phones
Option 2: Store as Comma-Separated (Not Recommended)
CustomerPhones: "9876543210,8765432109,01141234567"
Problems:
– Hard to query individual phones
– Difficult to update
– Violates database normalization
3. Stored vs Derived Attributes
Stored Attribute:
DateOfBirth : "1990-05-15" ( stored in database)
EmploymentStartDate : "2020-01-15" ( stored)
VehicleRegistrationDate : "2020-03-20 " ( stored)
Derived Attribute:
Age = Calculate from DateOfBirth
= TODAY( ) – DateOfBirth
= Age in years
YearsOfService = Calculate from StartDate
= TODAY( ) – EmploymentStartDate
= Years employed
VehicleAge = Calculate from RegistrationDate
= TODAY( ) – VehicleRegistrationDate
= In years
Design Principle: Avoid Storing Derived Data
EMPLOYEE Table:
BAD Design:
– DateOfBirth : "1985-06-15"
– Age: 38 ( DERIVED – DO NOT STORE)
– EmploymentStartDate : "2015-01-10"
– YearsOfService : 8 ( DERIVED – DO NOT STORE)
GOOD Design:
– DateOfBirth : "1985-06-15" ( STORED)
– EmploymentStartDate : "2015-01-10 " ( STORED)
– [Age calculated when needed]
– [ YearsOfService calculated when needed]
Benefits:
– No redundancy
– Always accurate (age updates automatically yearly)
– Less storage
– Easier maintenance
4. Key Attributes vs Descriptive Attributes
Key Attributes (Identify entities):
CustomerID : "CUST-001"
ProductCode : "PROD-2024-001"
OrderNumber : "ORD-123456"
Descriptive Attributes (Describe entities):
CustomerName : "John Smith"
ProductDescription : "Wireless Keyboard"
OrderDate : "2024-01-15"
Attribute Examples by Domain
Customer Entity
CUSTOMER
Key Attributes:
– CustomerID (unique identifier)
Descriptive Attributes:
– FirstName (simple)
– LastName (simple)
– FullName (composite: FirstName + LastName)
– DateOfBirth (simple)
– Address (composite: Street, City, State, ZIP)
– PhoneNumbers (multi-valued)
– EmailAddresses (multi-valued)
– CustomerType (e.g., "Retail", "Wholesale")
– CreditLimit (numeric)
– IsActive ( boolean )
Derived Attributes:
– Age (calculated from DateOfBirth )
– YearsAsCustomer (calculated from RegisterDate )
– TotalPurchases (calculated from Orders)
Product Entity
PRODUCT
Key Attributes:
– ProductID (unique identifier)
– ProductCode (unique business key)
Descriptive Attributes:
– ProductName (simple)
– Description (simple)
– Category (simple)
– Manufacturer (simple)
– UnitPrice (numeric)
– Quantity (simple)
– Colors (multi-valued)
– Sizes (multi-valued)
– ManufactureDate (date)
– ExpiryDate (date)
– IsDiscontinued ( boolean )
Derived Attributes:
– DaysUntilExpiry (calculated)
– AgeInDays (calculated)
– StockValue (calculated: Quantity × UnitPrice )
Order Entity
ORDER
Key Attributes:
– OrderID (unique identifier)
– OrderNumber (unique business key)
Descriptive Attributes:
– OrderDate (when order placed)
– DeliveryDate (when delivered)
– Status (e.g., "Pending", "Shipped", "Delivered")
– ShippingAddress (composite)
– BillingAddress (composite)
– PaymentMethod (simple)
– SpecialInstructions (simple)
– DeliveryInstructions (multi-valued)
– OrderNotes (simple)
Derived Attributes:
– TotalAmount (calculated from line items)
– TaxAmount (calculated)
– ShippingCost (calculated or retrieved)
– DaysSinceOrder (calculated)
Attribute Design Guidelines
1. Naming Conventions
Choose Clear, Descriptive Names:
GOOD:
– CustomerFirstName (clear, specific)
– ProductUnitPrice (clear, measurable)
– OrderCreatedDate (clear, specific)
BAD:
– CustFN (abbreviated, unclear)
– Price (ambiguous – unit, total, or gross?)
– Date (which date?)
– Data1, Data2 (meaningless)
Use Consistent Prefixes:
All dates have "Date" suffix:
– BirthDate
– OrderDate
– ShipDate
– DeliveryDate
All flags/ booleans have "Is" or "Has" prefix:
– IsActive
– IsDeleted
– HasShipped
– IsVerified
2. Appropriate Data Types
CUSTOMER
POOR CHOICES:
– BirthDate as TEXT: "1990-05-15"
– Age as TEXT: "28"
– CreditLimit as TEXT: "5000"
– IsActive as TEXT: "Yes"
GOOD CHOICES:
– BirthDate as DATE: stores date
– Age calculated from DATE
– CreditLimit as DECIMAL( 10,2)
– IsActive as BOOLEAN: true/false
3. Domain Constraints
Define Valid Values:
CUSTOMER
– CustomerType : Only "Individual", "Business", "Government"
– Status: Only "Active", "Inactive", "Suspended", "Closed"
– CreditRating : Only "A", "B", "C", "D"
– Age: Only 0-150
– CreditLimit : Only >= 0
4. Null Handling
Distinguish Between Types of Missing Data:
CustomerPhone = NULL
Interpretation Could Be:
1. Not provided (person has no phone)
2. Not applicable (some entities don't need phone)
3. Unknown (person has phone but not recorded)
4. Not yet collected (still gathering data)
Solution: Use separate " PhoneType " to clarify:
– PhoneType = "Not Provided"
– PhoneType = "Unknown"
– PhoneType = "Mobile"
Default Values:
GOOD Use of Defaults:
– IsActive : default TRUE (entities active by default)
– CreatedDate : default CURRENT_DATE (automatic timestamp)
– Status: default "Pending" (clear starting state)
POOR Use of Defaults:
– CustomerName : don't default (required field)
– OrderAmount : don't default (must be explicit)
– Purpose: don't default (must be intentional)
Entity-Attribute Relationship
Entity Occurrence
CUSTOMER Entity has Multiple Occurrences:
Occurrence 1:
– CustomerID : 001
– CustomerName : "John Smith"
– Address: "123 Main St"
– IsActive : true
Occurrence 2:
– CustomerID : 002
– CustomerName : "Jane Doe"
– Address: "456 Oak Ave"
– IsActive : true
Occurrence 3:
– CustomerID : 003
– CustomerName : "Bob Johnson"
– Address: "789 Pine Rd"
– IsActive : false
Instance vs Entity
ENTITY (Template):
CUSTOMER
– CustomerID
– CustomerName
– EmailAddress
– PhoneNumber
INSTANCE (Specific Example):
CustomerID : 12345
CustomerName : "Alice Brown"
EmailAddress : "alice@example.com"
PhoneNumber : "555-1234"
Common Modeling Mistakes
Mistake 1: Confusing Entity and Attribute
WRONG:
CUSTOMER (Entity)
– CustomerID
– CustomerName
– Address (might be Entity too!)
CORRECT:
CUSTOMER ( Entity) ADDRESS (Entity)
– CustomerID – AddressID
– CustomerName – Street
– DeliveryAddressID ( FK) – City
– BillingAddressID ( FK) – State
Relationship: Customer HAS multiple Addresses
Mistake 2: Storing All Derived Data
WRONG:
EMPLOYEE
– DateOfBirth
– Age ( NEVER STORE – CALCULATED)
– YearsOfService ( NEVER STORE – CALCULATED)
– SalaryWithBonus ( NEVER STORE – CALCULATED)
CORRECT:
EMPLOYEE
– DateOfBirth ( STORED – source for Age)
– HireDate ( STORED – source for YearsOfService )
– BaseSalary ( STORED – source for total)
[Calculate Age, YearsOfService, SalaryWithBonus in queries]
Mistake 3: Poor Naming Conventions
CONFUSING:
– Amt (what amount?)
– Dt (which date?)
– Qty (quantity of what?)
– Val (value in what currency?)
– Data (what data?)
– Desc (description of what?)
CLEAR:
– OrderAmount (specific, clear)
– OrderDate (clear, specific)
– InventoryQuantity (clear)
– ProductValue (clear)
– CustomerData (still vague, improve to specific field)
– ProductDescription (clear)
Mistake 4: Inconsistent Attribute Handling
INCONSISTENT:
– Some dates stored as TEXT: "2024-01-15"
– Some dates stored as DATE type: 2024-01-15
– Some booleans as TEXT: "Y"/"N"
– Some booleans as BOOLEAN: true/false
– Some IDs as TEXT: "CUST001"
– Some IDs as INTEGER: 1
CONSISTENT:
– All dates as DATE type
– All booleans as BOOLEAN type
– All IDs follow one pattern
– All amounts as DECIMAL( 10,2)
– All percentages as DECIMAL( 5,2)
Best Practices Summary
💡 Key Insights
📚 Study Notes