Data Model
The Directus data studio enables no-code configuration and management for any SQL database, with no arbitrary restrictions on how you build your data model. You get control over table, column and relationship configuration, as well as how users view and interact with data inside the data studio.
Learn More
Remember, you will have full access to manage your database using SQL. Directus will mirror any changes. You can also configure your data model programmatically via the API. To learn more, see our API documentation on Collections, Fields, and Relations.
Relational Data Models
In order to understand how Directus handles data models, you will need an understanding of what relational data models are. This section provides a brief summary of the core concepts. It may be useful as a review, or for business users working on your team that want a simple explanation of how data models work. If you have a firm knowledge of relational data model concepts, such as databases, data tables, columns, data types, primary and foreign keys, rows, relationships, and schemas then feel free to jump to Data Models in Directus.
Databases
Directus is an SQL database wrapper. A database is a set of data stored in a computer, in a structured way, making it organized, accessible, and scalable. The specific way you structure your data within a database is called your data model.
Database vs Excel
To make a comparison most business users can relate with, storing data in a database is somewhat similar to storing data in Excel spreadsheets. You know how you can build a table on one sheet in Excel, build another table on another sheet, then link the rows of each table together? That is pretty much how a relational data model works. But there are some key points where Excel and relational databases differ.
Many times, we store data as a table in Excel, but that's not always the case, as the program serves tons of other purposes. Excel lets you make your data stylized (bold, italicized, colored, custom fonts, etc.), set dynamic functions in cells, add graphics like charts and graphs, and input any kind of data into any cell you'd like with no enforced structure. Your Excel spreadsheet is a blank canvas, designed to store up to tens of thousands of rows of information.
There is no stylization within databases. They strictly store raw data values in a structured way. Any time you want to style data, build a function, put data into a graph, etc., you must create that functionality in your app or website. Databases store raw, un-stylized, structured data and are designed to handle millions, and in some cases billions and trillions, of rows of information.
Data Tables
SQL databases store data across data tables. Data tables typically store information about one distinct type of record, object, or observation, such as a financial transaction, blog post, geo-position, user, IoT event, or anything. Data tables are further broken down into columns and rows.
Columns
Columns are categories that store one kind of information. Each column has a unique, descriptive name and stores one unit of information in each of its cell values. Columns keep the data organized, consistent, and easily accessible. The columns you choose to add to a data table will completely depend on the information you need to store.
Cell Values
Each value in a column is stored in its own cell. In general, you want to create columns that save atomic values. That means create the column to store the smallest indivisible units There is no restriction for the kinds of information to include in a column, but there are good and bad practices. For example:
- A Bad Column: A
city_state_zipcode
column. - Good Columns: Separate
city
,state
andzipcode
columns.
Data Types
To further maintain structure and consistency, when you create a column, you must also define its data type. For example, an age
column might be assigned INTEGER
and a blog_content
column may be assigned a STRING
or TEXT
data type. There are countless incongruent, unexpected, and potentially dangerous behaviors that could emerge when a program tries to process data with the wrong data type.
To give an example, if you type the character 2
, it may be stored as an INTEGER
or as STRING
. If you stored 2
as an INTEGER
, when you try to add 2 + 2
, the computer will typically calculate 4
. In some languages, if you stored the character "2"
as a STRING
, when you try to add "2" + "2"
, the computer will concatenate them into 22
, while in others, trying to do this may crash the program!
Therefore when you work with data, it is important to know what its data type is because the wrong data type can cause unexpected and even dangerous behaviors in your program.
Rows
Each row stores data associated to a unique record, event, object, entity, observation, etc. Data tables can contain millions, even billions and trillions of rows of data.
Primary Keys
In order to uniquely identify and track each row, every data table must have a primary key column. A primary key is a unique ID that identifies a specific row. Any pattern or system could be used to generate primary keys, so long as it guarantees each key is unique. Perhaps the most common is incrementing integers, where the primary key on each new row increments as follows 1
, 2
, 3
, 4
, etc... The primary key column guarantees you can always find a row and differentiate it from other rows.
Foreign Keys
Since primary keys uniquely identify each and every row in a data table, they are the perfect tool to create relationships. If you want to relationally link rows between two data tables, you create a column to store foreign keys, which are simply primary keys from a foreign table. This is called a foreign key column, to signify it stores the keys from a foreign table.
Parent vs. Related Tables
When we talk about two related tables, we refer to them as the parent table and the related table. These two terms are based solely on perspective, similar to the terms this and that or the terms here and there, signifying the perspective from which you are looking at the relationship.
For example, within the data model, a many-to-one relationship is the same as a one-to-many relationship, the term used just depends on which collection you consider the parent.
Types of Relationships
There are several ways you can relationally link tables:
- One to One — Each row in the parent data table can link to one row (max) in the related table.
- Many to One — Many rows in the parent data table can link to one row in the related table.
- One to Many — Each row in a data table can link to many rows in another data table.
Note: in a data model, Many-to-One and One-to-Many relationships are identical. The naming difference is a matter of perspective. - Many to Many — Many rows in the parent table can link to many rows in the related table. M2M relationships require a third table, called junction table. An M2M is nothing more than an O2M and an M2O stored on the junction table.
- Many to Any — Many Rows in a data table can link to many rows across any other table in the database. Similar to M2M relationships, M2As require a junction data table as well as an additional column on the junction table to store the related tables' names.
To learn more about how these relationships work conceptually, as well as how they are handled within Directus, see our guide on relationships.
Database Schemas
In our examples so far, we have seen and described actual data tables. As you design your relational data model, you will need to create a schema to keep track of its complexity.
A schema is a blueprint for your data model, which defines its data tables, columns in each table, details about each column and relationships between tables. It does not include the actual data points stored. Here is a simple schema of two relationally linked tables:
table_one
- column1 (primary key)
- column2 (data type, optionally explain what the column stores)
- column3 (...)
table_one
- column1 (primary key)
- column2 (data type, optionally explain what the column stores)
- column3 (...)
table_two
- column1 (primary key)
- column2 (...)
- column3 (...)
- table_one_id (foreign key, relationally links rows via table_one.column1)
table_two
- column1 (primary key)
- column2 (...)
- column3 (...)
- table_one_id (foreign key, relationally links rows via table_one.column1)
In the schema above, we defined two tables with overtly generic names table_one
, table_two
and column1
, column2
, etc. The names you choose for data tables and columns are up to you. Ideally, you should pick unique, memorable names that identify the data contents stored within.
In this documentation, we bend the rules of traditional database schemas in two ways. First, we sometimes add a full-length explanation of what a column is. Notice in table_one.column2
there is a note after the column name (data type, optionally....
. It is common practice to include a column's data type in a table schema, but full descriptions are not. Second, in our examples, if columns exist in the table but its data type or other details are irrelevant to the current learning point, we omit their details so you can focus on the important columns.
Please note too, that with more complex schemas, containing dozens (or maybe hundreds!) of relationally linked data tables, you usually include datatype information as well as a visualization of how each and every table interlinks.
Avoid Data Duplication
Relational databases allow us to build data models that avoid data duplication, or in other words, when you have the same data stored in multiple locations in your database. Data Duplication is inefficient and dangerous.
To give an example, let's consider a blog
table. In a blog, you may want to display the author's details, so you add an author_name
column.
blog
- id
- title
- content
- author_name (string, stores author's full name)
blog
- id
- title
- content
- author_name (string, stores author's full name)
The table above stores the author name directly inside of the blog
data table. However, let's imagine that along with our blog posts, we want to display more information about authors, such as their email address, social media, etc. We could put this author information into the blog
data table.
blog
- id
- title
- content
- author_name (string, stores author's full name)
- author_email (string, stores author's email)
- author_img (string, stores link to author's profile picture)
blog
- id
- title
- content
- author_name (string, stores author's full name)
- author_email (string, stores author's email)
- author_img (string, stores link to author's profile picture)
You might be starting to notice this data table no longer represents one single object, but two: blog posts and authors. This is almost always a sign the data should be split across different tables and relationally linked.
Now let's also imagine that authors are one type of user. All user details are stored in a users
data table and its data is displayed on each user profile page, for chat messaging and other types of transactions, this is a common situation in many projects. In this case, the author name and other details would also need to exist in the users
table.
This creates duplicate data. There are two big problems with this:
First, it becomes difficult or impossible to maintain accurate information. If the author decides to change their social media information under users
, someone would have to go through and update author details on every single row containing their blog posts. With just 10 or even 100 blog posts, this would be annoying but perhaps not a massive problem. However, as volume of data grows to millions or billions of rows, updating duplicate data becomes a serious problem.
Furthermore, an error on an author's name and personal information may not be a truly dangerous situation, but inaccurate information would be catastrophic in data tables containing banking transactions or medical records!
Second, it wastes storage space and slows down performance. With a data model containing a few hundred blog posts, duplicate data may not take up enough space to cause huge drops in performance. But again, if you have the same information repeated again and again over millions or billions of rows, storage is wasted on a massive scale.
Why We Use Relational Data Models
As shown in the previous section, you want to make sure that every single data point is unique. This is where the relational part of relational data models comes into play. To avoid data duplication, it is always best practice to normalize your data model, which is the technical term used to describe designing a data model so that there is no duplicate information stored at all. Instead of storing all information needed in a given situation in one table, like we saw when mixing up blog and author information above, database normalization is the process of splitting up this information across tables and relationally linking it all together so that information is never repeated.
There is a lot to learn to master database normalization and a thorough education in the practice goes beyond the scope of this document. There are plenty of resources to learn about it online. However, to provide one simple example by improving the example blog
data table provided in the previous Avoid Data Duplication section:
blog
- id
- title
- content
- author_name
- author_email
- author_img
blog
- id
- title
- content
- author_name
- author_email
- author_img
As described in the section on Rows, we want each row in a data table to represent one unique record, event, object, entity, observation, etc. To do this, we can remove the author_name
column from the blog
table and replace it with an author_id
foreign key column, which stores foreign keys from the users
table.
blog
- id
- title
- content
- author_id (stores foreign key from users.id)
blog
- id
- title
- content
- author_id (stores foreign key from users.id)
users
- id
- name
- email
- role
- email
- twitter
users
- id
- name
- email
- role
- email
- twitter
Notice the difference. Previously, we placed the author's name directly into a column on the blog
data table (creating duplicate data). Here, we replaced author_name
, author_email
and author_img
, with the author_id
column, which contains foreign keys from users
. From here, we can use the foreign key to relationally link data between blog
and users
.
Working With Relational Data Models
Once you've designed your data model conceptually, you typically build and interact with it using SQL, or Structured Querying Language. This language is used to create, read & query, update, and delete anything and everything in the database.
Once the initial data tables are designed and built, a common next step is to build a backend using something like Node.js or Flask. In your backend, you must code custom API endpoints and logic to create, read, query, update, and delete data for your specific data model. However, when the backend accesses data, it is still raw, with no stylization. Raw data is easy to work with for computers, but often quite difficult to work with for humans.
To those who are unfamiliar, the SQL language, raw data, and traditional relational database jargon can feel unintuitive and overly-technical.
It may not be practical to teach everyone on the team how to work with and think in terms of raw data. In some cases, business users may find it difficult or nearly impossible to work with raw data. People are accustomed to see information displayed colorfully, stylized, embedded on a map, etc. For example, most people in most situations would find it easy to work with pinpoints on a map, yet find it nearly impossible to identify a position on a world map from raw latitude and longitude points stored as JSON.
{
"location": {
"lat": 36.08801,
"lng": 120.379771
}
}
{
"location": {
"lat": 36.08801,
"lng": 120.379771
}
}
Therefore, developers need to build front-ends with polished UIs and custom display logic to make working with the data human-friendly. However, even for developers with strong SQL database skills, building out APIs and GUIs to build and manage a data model is time consuming.
Data Models in Directus
All relational data model concepts listed above apply in Directus. You get complete, un-opinionated, relational data model design and configuration. The difference is that Directus handles all SQL, builds the API, and provides a Data Studio which lets business users work with data in a human-friendly way.
The Data Studio also offers features and functionalities to display and interact with your data intuitively. Once your data model is configured, the data is accessible across the other modules.
You have the power to do the following things, without a line of code or SQL:
- View, configure, and manage your relational data model and asset storage.
- Configure how data is displayed within the Data Studio.
- Configure how data is interacted with by users in the Data Studio.
- Translate any and all text in the Data Studio into any language.
Directus replaces traditional relational database jargon with more user-friendly terms and concepts. Please keep in mind that while traditional relational database jargon strictly encompasses database concepts, some of the new Directus terms encompass these relational database concepts plus display and interaction logic. The following sections will introduce Directus terms and map them to classic relational database concepts.
Collections
A collection is a set of items. This can be a 1-1 match-up with a data table in SQL, a group of other collections, or a readonly view.
You access all collections, including built-in system collections required to power your project, under Settings > Data Model. From there, click a collection to open its configurations page. To learn more, see our guide on collections.
Fields
Fields are database columns, but with a twist.
Remember, SQL database columns store pure, raw data. From there, developers build out custom logic and UIs to determine how this data is displayed and interacted with. In Directus, fields encompass column configurations, as well as custom configuration over how the data is displayed and interacted with in the Data Studio. Directus also has alias fields, which are virtual and do not match directly to a column. To learn more, see our guide on fields.
Items
Items are data table rows, but with a twist.
Remember from our discussion above about traditional databases, the ideal relational database is normalized. Unfortunately, normalized data is not always the easiest for people to imagine or envision because related data is spread across multiple data tables. Therefore, when you access an item, you may get more than just the current collection's row level-data, in some cases an item may provide access to the data in related rows.
You access items from other app modules, such as Content, User Directory, and File Library.
Data Type Superset
Directus abstracts type differences between SQL vendors with a Data Type Superset.
Keys and IDs
Primary keys are called IDs in Directus fairly frequently. When you create a collection, you must add an id
field. Directus supports the following types of IDs:
- Auto-Incremented Integer — IDs increment
1
,2
,3
up to2^31-1
or2,147,483,647
. - Auto-Incremented Big Integer — IDs increment
1
,2
,3
up to2^63-1
or9,223,372,036,854,775,807
. (only available in MySQL and PostgreSQL) - Generated UUID — Universally Unique Identifier. Creates a completely unique ID. IDs generated with this system (not just in your database, but anywhere this system is used) are so statistically unlikely to be repeated that for all practical purposes they are unique.
- Manually Entered String — You manually type out a unique string as the ID for each item.
Relationships
Directus supports all standard types of relationships, as well as a few more of its own compound types. To learn more, see our guide on relationships.