Wednesday, February 13, 2019

Data Modelling Terminology

Data Modelling is the diagrammatic representation showing how the entities are related to each other. It is the initial step towards database design. We first create the conceptual model, then logical model and finally move to the physical model.
Generally, the data models are created in data analysis & design phase of software development life cycle.

There are three types of data models – conceptual, logical and physical. The level of complexity and detail increases from conceptual to logical to a physical data model.
  • Conceptual model will be just portraying entity names and entity relationships. Figure 1 shown in the later part of this article depicts a conceptual model.
  • Logical model will be showing up entity names, entity relationships, attributes, primary keys and foreign keys in each entity. Figure 2 shown inside question#4 in this article depicts a logical model.
  • Physical data model will be showing primary keys, foreign keys, table names, column names and column data types. This view actually elaborates how the model will be actually implemented in the database.
Design sachems in Data Modelling:
  • Star Schema
  • Snowflake Schema
Star Schema The simplest of the schemes is star schema where we have a fact table in the center which references multiple dimension tables around it. All the dimension tables are connected to the fact table. The primary key in all dimension tables acts as a foreign key in the fact table.
The star schema is quite simple, flexible and it is in de-normalized form


Snowflake Schema  In a snowflake schema, the level of normalization increases. The fact table here remains the same as in star schema. However, the dimension tables are normalized.  Due to several layers of dimension tables, it looks like a snowflake and thus it is named as snowflake schema.





Dimension and Attribute Dimensions represent qualitative data. For example– plan, product, class are all dimensions. A dimension table contains descriptive or textual attributes. For example, product category & product name are the attributes of product dimension.

Fact and Fact table Facts represent quantitative data. For example – net amount due is a fact. A fact table contains numerical data and foreign keys from related dimensional tables.

Different Types of Dimensions There are typically five types of dimensions.

1) Conformed dimensions: A Dimension that is utilized as a part of different areas is called as conformed dimension. It might be utilized with different fact tables in a single database or over numerous data marts/warehouses. For example, if subscriber dimension is connected to two fact tables – billing and claim then the subscriber dimension would be treated as conformed dimension.
2) Junk Dimension: It is a dimension table comprising of attributes that don’t have a place in the fact table or in any of the current dimension tables. Generally, these are the properties like flags or indicators. For example, it can be member eligibility flag set as ‘Y’ or ‘N’ or any other indicator set as true/false, any specific comments, etc. if we keep all such indicator attributes in the fact table then its size gets increased. So, we combine all such attributes and put in a single dimension table called as junk dimension having unique junk IDs with a possible combination of all the indicator values.
3) Role Playing Dimension: These are the dimensions which are utilized for multiple purposes in the same database. For example, a date dimension can be used for “Date of Claim”, “Billing date” or “Plan Term date”. So, such a dimension will be called as Role playing dimension. The primary key of Date dimension will be associated with multiple foreign keys in the fact table.
4) Slowly Changing Dimension (SCD): These are most important amongst all the dimensions. These are the dimensions where attribute values vary with time. Below are the varies types of SCDs
  • Type-0: These are the dimensions where attribute value remains steady with time. For example, Subscriber’s DOB is a type-0 SCD because it will always remain the same irrespective of the time.
  • Type-1: These are the dimensions where previous value of the attribute is replaced by the current value. No history is maintained in Type-1 dimension. For example, Subscriber’s address (where the business requires to keep the only current address of subscriber) can be a Type-1 dimension.
  • Type-2: These are the dimensions where unlimited history is preserved. For example, Subscriber’s address (where the business requires to keep a record of all the previous addresses of the subscriber). In this case, multiple rows for a subscriber will be inserted in the table with his/her different addresses.
  • There will be some column(s) that will identify the current address. For example, ‘start date’ and ‘End date’. The row where ‘End date’ value will be blank would contain subscriber’s current address and all other rows will be having previous addresses of the subscriber.
  • Type-3: These are the type of dimensions where limited history is preserved. And we use an additional column to maintain the history. For example, Subscriber’s address (where the business requires to keep a record of current & just one previous address). In this case, we can dissolve the ‘address’ column into two different columns – ‘current address’ and ‘previous address’.
  • So, instead of having multiple rows, we will be having just one row showing current as well as the previous address of the subscriber.
  • Type-4: In this type of dimension, the historical data is preserved in a separate table. The main dimension table holds only the current data.
  • For example, the main dimension table will have only one row per subscriber holding its current address. All other previous addresses of the subscriber will be kept in the separate history table. This type of dimension is hardly ever used.
5) Degenerated Dimension: A degenerated dimension is a dimension which is not a fact but presents in the fact table as a primary key. It does not have its own dimension table. We can also call it as a single attribute dimension table.

Data mart Data marts are for the most part intended for a solitary branch of business. They are designed for the individual departments. For example, I used to work for a health insurance provider company which had different departments in it like Finance, Reporting, Sales and so forth.
We had a data warehouse that was holding the information pertaining to all these departments and then we have few data marts built on top of this data warehouse. These DataMart were specific to each department. In simple words, you can say that a DataMart is a subset of a data warehouse.

Measures
  • Non- additive measures
  • Semi-additive measures
  • Additive measures
Non-additive measures are the ones on top of which no aggregation function can be applied. For example, a ratio or a percentage column; a flag or an indicator column present in fact table holding values like Y/N, etc. is a non-additive measure.
Semi-additive measures are the ones on top of which some (but not all) aggregation functions can be applied. Example – fee rate or account balance.
Additive measures are the ones on top of which all aggregation functions cab be applied. Example- units purchased.

SHOW ENGINE INNODB STATUS

  The SHOW ENGINE INNODB STATUS command in MySQL provides detailed information about the internal state of the InnoDB storage engine. This ...