Entities and Attributes

 

Entities and attributes form the foundational building blocks of any data model—understanding how to identify, define, and represent entities and their characteristics is critical for designing robust databases that accurately capture business requirements—this practical topic explores entity recognition, attribute types, and design principles that ensure your data models are clear, complete, and maintainable.

 

What is an Entity?

Definition

An entity is a thing, person, place, event, or concept that is important to the business and about which we want to store data. Entities are the primary objects in a data model.

Characteristics of Entities

  • Identifiable: Can be uniquely distinguished from other entities
  • Concrete or Abstract: Can be physical (Person, Product) or conceptual (Order, Transaction)
  • Independent Existence: Exists on its own
  • Multiple Instances: Multiple occurrences in the system
  • Important to Business: Represents something the business cares about
  •  

    Examples of Entities

    Business Domain: Indian E-commerce

    CUSTOMER (Entity)

    – Represents each customer ( Grahak )

    – Multiple instances (lakh customers)

    – Important to business (revenue, relationships)

     

    PRODUCT (Entity)

    – Represents inventory items ( Utpadan )

    – Multiple instances (product catalog)

    – Important to business (sales, revenue)

     

    ORDER (Entity)

    – Represents customer purchases ( Aadessh )

    – Multiple instances (thousands of orders)

    – Important to business (revenue, fulfillment)

     

    WAREHOUSE (Entity)

    – Represents storage locations across India

    – Multiple instances (Delhi, Mumbai, Bangalore)

    – Important to business (inventory management)

    Business Domain: Banking

    ACCOUNT (Entity)

    – Savings, Current, Recurring Deposit accounts

    – Multiple instances per person

    – Critical to business

     

    TRANSACTION (Entity)

    – Deposits, Withdrawals, Transfers

    – Millions of instances daily

    – Core to operations

     

    CUSTOMER (Entity)

    – Persons and corporations

    – Thousands of instances

    – Business relationship

     

    BRANCH (Entity)

    – Physical bank branches across India

    – Multiple instances (urban and rural)

    – Service delivery point

     

    ATM_LOCATION (Entity)

    – ATM booth locations

    – Multiple instances per city

    – Service access point

    Business Domain: Indian Insurance

    POLICY (Entity)

    – Health, Life, Motor, Home insurance

    – Lakhs of instances

    – Core product

     

    CLAIM (Entity)

    – Insurance claims filed

    – Thousands per month

    – Expense management

     

    POLICYHOLDER (Entity)

    – Persons and organizations

    – Multiple instances

    – Customer

     

    AGENT (Entity)

    – Insurance agents across India

    – Thousands of agents

    – Sales channel

     

    What is an Attribute?

    Definition

    An attribute is a characteristic or property of an entity. It describes a specific detail about that entity.

    Components of an Attribute

  • Name: What we call it (e.g., " CustomerName ")
  • Data Type: What kind of data it holds (text, number, date)
  • Domain: Valid values (e.g., age 0-150)
  • Constraint: Rules (e.g., NOT NULL, UNIQUE)
  • Default Value: What to use if not provided
  •  

    Attribute Types

    1. Simple vs Composite Attributes

    Simple Attribute:

  • Single, indivisible value
  • Cannot be broken down further
  • CustomerAge : 28

    Temperature: 72.5

    IsActive : true

     

    Composite Attribute:

  • Composed of multiple sub-attributes
  • Can be decomposed into components
  • Use when components accessed separately
  • Address (India):

    – Street: "MG Road"

    – City: "Bangalore"

    – State: "Karnataka"

    PinCode : "560001"

    – Country: "India"

    CustomerName :

    – FirstName: "Rajesh"

    MiddleName : "Kumar"

    – LastName: "Sharma"

    – Salutation: "Mr."

     

    PhoneNumber :

    CountryCode : "+91"

    AreaCode : "11" (Delhi)

    LocalNumber : "41234567"

    Design Decision: When to Decompose?

    Decompose if:

  • Components accessed separately (City querying)
  • Components have different update cycles
  • Business requirements ask for each part
  • Reporting needs individual components
  • Keep Combined if:

  • Always used together
  • Never referenced separately
  • Simplicity preferred
  •  

    2. Single-Valued vs Multi-Valued Attributes

    Single-Valued Attribute:

  • Each entity has exactly one value
  • Most common type
  • CustomerAadhaarNumber : "123456789012" ( one Aadhaar per person)

    PermanentAddress : "123 MG Road, Bangalore " ( one permanent address)

    PanNumber : "ABCDE1234F" ( one PAN per person)

    Multi-Valued Attribute:

  • Each entity can have multiple values
  • Design challenge: How to store in relational database?
  • CustomerPhoneNumbers (Indian context):

    – Mobile: "9876543210"

    – Mobile: "8765432109"

    – Landline: "01141234567" (Delhi number)

     

    CustomerEmails :

    – "rajesh@work.com"

    – "rajesh@personal.com"

     

    CustomerAddresses :

    ResidentialAddress (Ghar ka Pata)

    OfficeAddress (Daftari Pata)

    HeadquartersAddress (Aam Pata)

     

    Handling Multi-Valued Attributes in Relational Design

    Option 1: Create Separate Table (Recommended)

    CUSTOMER

    CustomerID (1)

    CustomerName

     

    CUSTOMER_PHONE (Many)

    CustomerID

    PhoneNumber

    PhoneType (Mobile, Landline)

     

    Relationship: 1 Customer has Many Phones

    Option 2: Store as Comma-Separated (Not Recommended)

    CustomerPhones: "9876543210,8765432109,01141234567"

    Problems:

    – Hard to query individual phones

    – Difficult to update

    – Violates database normalization

     

    3. Stored vs Derived Attributes

    Stored Attribute:

  • Actual data stored in database
  • Source of truth
  • Relatively stable values
  • DateOfBirth : "1990-05-15" ( stored in database)

    EmploymentStartDate : "2020-01-15" ( stored)

    VehicleRegistrationDate : "2020-03-20 " ( stored)

    Derived Attribute:

  • Calculated from other attributes
  • Never stored in database
  • Always up-to-date (no maintenance needed)
  • Saves storage space
  • Age = Calculate from DateOfBirth

    = TODAY( ) – DateOfBirth

    = Age in years

    YearsOfService = Calculate from StartDate

    = TODAY( ) – EmploymentStartDate

    = Years employed

     

    VehicleAge = Calculate from RegistrationDate

    = TODAY( ) – VehicleRegistrationDate

    = In years

    Design Principle: Avoid Storing Derived Data

    EMPLOYEE Table:

    BAD Design:

    DateOfBirth : "1985-06-15"

    – Age: 38 ( DERIVED – DO NOT STORE)

    EmploymentStartDate : "2015-01-10"

    YearsOfService : 8 ( DERIVED – DO NOT STORE)

     

    GOOD Design:

    DateOfBirth : "1985-06-15" ( STORED)

    EmploymentStartDate : "2015-01-10 " ( STORED)

    [Age calculated when needed]

    – [ YearsOfService calculated when needed]

     

    Benefits:

    – No redundancy

    – Always accurate (age updates automatically yearly)

    – Less storage

    – Easier maintenance

    4. Key Attributes vs Descriptive Attributes

    Key Attributes (Identify entities):

  • Uniquely identify an entity
  • Essential for relationships
  • Usually indexed for fast lookup
  • Stable values (rarely change)
  • CustomerID : "CUST-001"

    ProductCode : "PROD-2024-001"

    OrderNumber : "ORD-123456"

    Descriptive Attributes (Describe entities):

  • Provide information about entity
  • Not unique
  • Change more frequently
  • Support reporting and queries
  • CustomerName : "John Smith"

    ProductDescription : "Wireless Keyboard"

    OrderDate : "2024-01-15"

     

    Attribute Examples by Domain

    Customer Entity

    CUSTOMER

     

    Key Attributes:

    CustomerID (unique identifier)

     

    Descriptive Attributes:

    – FirstName (simple)

    – LastName (simple)

    FullName (composite: FirstName + LastName)

    DateOfBirth (simple)

    – Address (composite: Street, City, State, ZIP)

    PhoneNumbers (multi-valued)

    EmailAddresses (multi-valued)

    CustomerType (e.g., "Retail", "Wholesale")

    CreditLimit (numeric)

    IsActive ( boolean )

     

    Derived Attributes:

    – Age (calculated from DateOfBirth )

    YearsAsCustomer (calculated from RegisterDate )

    TotalPurchases (calculated from Orders)

    Product Entity

    PRODUCT

     

    Key Attributes:

    ProductID (unique identifier)

    ProductCode (unique business key)

     

    Descriptive Attributes:

    – ProductName (simple)

    – Description (simple)

    – Category (simple)

    – Manufacturer (simple)

    UnitPrice (numeric)

    – Quantity (simple)

    – Colors (multi-valued)

    – Sizes (multi-valued)

    ManufactureDate (date)

    ExpiryDate (date)

    IsDiscontinued ( boolean )

     

    Derived Attributes:

    DaysUntilExpiry (calculated)

    AgeInDays (calculated)

    StockValue (calculated: Quantity × UnitPrice )

    Order Entity

    ORDER

     

    Key Attributes:

    OrderID (unique identifier)

    OrderNumber (unique business key)

     

    Descriptive Attributes:

    OrderDate (when order placed)

    DeliveryDate (when delivered)

    – Status (e.g., "Pending", "Shipped", "Delivered")

    ShippingAddress (composite)

    BillingAddress (composite)

    PaymentMethod (simple)

    SpecialInstructions (simple)

    DeliveryInstructions (multi-valued)

    OrderNotes (simple)

     

    Derived Attributes:

    TotalAmount (calculated from line items)

    TaxAmount (calculated)

    ShippingCost (calculated or retrieved)

    DaysSinceOrder (calculated)

     

    Attribute Design Guidelines

    1. Naming Conventions

    Choose Clear, Descriptive Names:

    GOOD:

    CustomerFirstName (clear, specific)

    ProductUnitPrice (clear, measurable)

    OrderCreatedDate (clear, specific)

     

    BAD:

    CustFN (abbreviated, unclear)

    – Price (ambiguous – unit, total, or gross?)

    – Date (which date?)

    – Data1, Data2 (meaningless)

    Use Consistent Prefixes:

    All dates have "Date" suffix:

    BirthDate

    OrderDate

    ShipDate

    DeliveryDate

     

    All flags/ booleans have "Is" or "Has" prefix:

    IsActive

    IsDeleted

    HasShipped

    IsVerified

    2. Appropriate Data Types

    CUSTOMER

     

    POOR CHOICES:

    BirthDate as TEXT: "1990-05-15"

    – Age as TEXT: "28"

    CreditLimit as TEXT: "5000"

    IsActive as TEXT: "Yes"

     

    GOOD CHOICES:

    BirthDate as DATE: stores date

    – Age calculated from DATE

    CreditLimit as DECIMAL( 10,2)

    IsActive as BOOLEAN: true/false

    3. Domain Constraints

    Define Valid Values:

    CUSTOMER

    CustomerType : Only "Individual", "Business", "Government"

    – Status: Only "Active", "Inactive", "Suspended", "Closed"

    CreditRating : Only "A", "B", "C", "D"

    – Age: Only 0-150

    CreditLimit : Only >= 0

    4. Null Handling

    Distinguish Between Types of Missing Data:

    CustomerPhone = NULL

     

    Interpretation Could Be:

    1. Not provided (person has no phone)

    2. Not applicable (some entities don't need phone)

    3. Unknown (person has phone but not recorded)

    4. Not yet collected (still gathering data)

     

    Solution: Use separate " PhoneType " to clarify:

    PhoneType = "Not Provided"

    PhoneType = "Unknown"

    PhoneType = "Mobile"

    Default Values:

    GOOD Use of Defaults:

    IsActive : default TRUE (entities active by default)

    CreatedDate : default CURRENT_DATE (automatic timestamp)

    – Status: default "Pending" (clear starting state)

     

    POOR Use of Defaults:

    CustomerName : don't default (required field)

    OrderAmount : don't default (must be explicit)

    – Purpose: don't default (must be intentional)

     

    Entity-Attribute Relationship

    Entity Occurrence

    CUSTOMER Entity has Multiple Occurrences:

     

    Occurrence 1:

    CustomerID : 001

    CustomerName : "John Smith"

    – Address: "123 Main St"

    IsActive : true

     

    Occurrence 2:

    CustomerID : 002

    CustomerName : "Jane Doe"

    – Address: "456 Oak Ave"

    IsActive : true

     

    Occurrence 3:

    CustomerID : 003

    CustomerName : "Bob Johnson"

    – Address: "789 Pine Rd"

    IsActive : false

    Instance vs Entity

    ENTITY (Template):

    CUSTOMER

    CustomerID

    CustomerName

    EmailAddress

    PhoneNumber

     

    INSTANCE (Specific Example):

    CustomerID : 12345

    CustomerName : "Alice Brown"

    EmailAddress : "alice@example.com"

    PhoneNumber : "555-1234"

     

    Common Modeling Mistakes

    Mistake 1: Confusing Entity and Attribute

    WRONG:

    CUSTOMER (Entity)

    CustomerID

    CustomerName

    – Address (might be Entity too!)

     

    CORRECT:

    CUSTOMER ( Entity) ADDRESS (Entity)

    CustomerID AddressID

    CustomerName – Street

    DeliveryAddressID ( FK) – City

    BillingAddressID ( FK) – State

     

    Relationship: Customer HAS multiple Addresses

    Mistake 2: Storing All Derived Data

    WRONG:

    EMPLOYEE

    DateOfBirth

    – Age ( NEVER STORE – CALCULATED)

    YearsOfService ( NEVER STORE – CALCULATED)

    SalaryWithBonus ( NEVER STORE – CALCULATED)

     

    CORRECT:

    EMPLOYEE

    DateOfBirth ( STORED – source for Age)

    HireDate ( STORED – source for YearsOfService )

    BaseSalary ( STORED – source for total)

    [Calculate Age, YearsOfService, SalaryWithBonus in queries]

    Mistake 3: Poor Naming Conventions

    CONFUSING:

    – Amt (what amount?)

    – Dt (which date?)

    – Qty (quantity of what?)

    – Val (value in what currency?)

    – Data (what data?)

    – Desc (description of what?)

     

    CLEAR:

    OrderAmount (specific, clear)

    OrderDate (clear, specific)

    InventoryQuantity (clear)

    ProductValue (clear)

    CustomerData (still vague, improve to specific field)

    ProductDescription (clear)

    Mistake 4: Inconsistent Attribute Handling

    INCONSISTENT:

    – Some dates stored as TEXT: "2024-01-15"

    – Some dates stored as DATE type: 2024-01-15

    – Some booleans as TEXT: "Y"/"N"

    – Some booleans as BOOLEAN: true/false

    – Some IDs as TEXT: "CUST001"

    – Some IDs as INTEGER: 1

     

    CONSISTENT:

    – All dates as DATE type

    – All booleans as BOOLEAN type

    – All IDs follow one pattern

    – All amounts as DECIMAL( 10,2)

    – All percentages as DECIMAL( 5,2)

     

    Best Practices Summary

  • Identify Entities First: What things does business care about?
  • Distinguish Attributes: Don't confuse entities with attributes
  • Avoid Derived Data: Don't store calculated values
  • Use Consistent Naming: Clear, descriptive, standardized
  • Choose Appropriate Types: Match data type to data
  • Define Constraints: Specify valid values
  • Decompose Composites: Only if components used separately
  • Handle Multi-Valued Data: Use separate tables/entities
  • Handle NULLs Deliberately: Understand meaning of NULL
  • Document Everything: Explain business meaning
  • 💡 Key Insights

  • Entities are nouns: Customer, Product, Order (things)
  • Attributes are adjectives: Red, Large, Active (characteristics)
  • Entities independent: Can exist without other entities
  • Attributes dependent: Belong to entities
  • Derived = Calculated: Never store what you can calculate
  • Composite ≠ Stored separately: Decompose only if needed
  • 📚 Study Notes

  • Entity: Independent thing important to business
  • Attribute: Property/characteristic of entity
  • Simple: Single, indivisible value
  • Composite: Multiple related sub-attributes
  • Single-valued: One value per entity
  • Multi-valued: Multiple values per entity
  • Stored: Actual data in database
  • Derived: Calculated from other attributes
  • Key: Identifies entity
  • Descriptive: Describes entity
  • Domain: Valid values for attribute
  • Constraint: Rule for attribute
  •  

     

    Leave a Reply