Non-clustered index in SQL Server
In previous article we had discussed internals of clustered indexing. Here we are going to discussed non clustered indexing in details and we will use the same example table as shown below.
CREATE TABLE STUDENT
(
StudentID int IDENTITY(1,1) NOT NULL,
FirstName char(50) NOT NULL,
LastName char(50) NOT NULL,
SSN char(11) NOT NULL
)
Above table may be a clustered table or may not be a clustered table. It’s not mandatory for a table to be clustered one to create a non-clustered index on it. We want to create a non-clustered index “IX_Student_SSN” on SSN column of Student table as shown below
CREATE UNIQUE NONCLUSTERED INDEX IX_Stuent_SSN ON Student (SSN);
A non-clustered index may be unique or non-unique. Above “create index” query shows it is a unique non-clustered index. Why we have chosen the non-clustered index to be unique? Very simple answer to this question is that we expect the Social Security Number (SSN) to be unique and we create non clustered index on this column. Had it been a non-unique column (SSN) the non-clustered index would have been non-unique non-clustered index as shown below.
CREATE NONCLUSTERED INDEX IX_Stuent_SSN ON Student (SSN);
Let us consider that the Student table has 100,000 rows. We are going to find number of leaf level, intermediate level and root level pages of the non-clustered index. In case of non-clustered index leaf level pages do not contain the actual data, they are not data pages. All pages in a non-clustered index are index pages irrespective of the level in a balanced tree. The actual data is always present in the leaf level pages of balanced tree if the table is a clustered one and if not a clustered table, data present in pages that are in heap. In other words non-clustered index is used to locate the actual data that is present in leaf level of clustered index or heap.
Leaf level page entry for unique or non-unique non-clustering index as shown in Figure 1
= Non Clustered index columns(s) + Clustering Key OR fixed RID (if Heap table) + Row over head
Thus the row size in leaf level pages (considering the table is clustered one having clustered key studentID)
= 11 bytes + 4 bytes + (1 byte row overhead + 2 byte slow array)
= 11 bytes + 4 bytes + 3 bytes
= 18 bytes
Number of rows in leaf level page, (recall the page size is 8 KB and 8096 byte is usable as described in previous article)
= 8096 / 18
= 449.77 = 449
Number of leaf level pages required
= 100,000 / 449 = 222.71 = 223
According to the calculation we need 223 leaf level pages in our non-clustered index. Let’s see the internal structure of the leaf level pages as shown in Figure 1. If the table is not clustered one, simply we need to replace StudentID with RID (row identifier) in leaf level pages.
Figure 1: Leaf Level pages of non-clustered index
We need to understand one thing very well is that we need to have 100,000 rows in leaf level pages of non-clustered index if the table has 100,000 rows and as shown above in Figure 1, SSN and StudentID is actually duplicated in leaf level pages of non-clustered index. Duplicated in the sense they also present in leaf level pages of clustered index or pages in Heap (if table is non-clustered one). If we have 10 non clustered index on Student table, each of them will have 100,000 row in leaf level pages and the index column(s) + clustering key OR Heap RID will be duplicated 10 times. So we need to be very careful when creating non-clustered index on a table and should never create too many indexes on a table as this will be huge overhead in terms of index maintenance and memory. Every insert and delete on a table must touch all non-clustered index to keep them updated.
Now let’s see the structure of non-leaf level (intermediate or root level) pages. We will consider both non-unique non-clustered index and unique clustered index.
Non leaf level entry for unique non-clustered index
= Non clustered index column(s) + Pointer to next level pages + row overhead
= Non clustered index column(s) + (fileID + pageID) + row overhead
= 11 bytes + (2 bytes + 4 bytes) + 1 byte
= 18 bytes + 2 byte slot array
= 20 bytes
Any page having more than one row will have slot array of 2 bytes for each row, so we have added 2 bytes per row in above calculation.
Considering 8096 bytes of usable space per page, the number of rows per page for non-leaf level
= 8096 / 20 = 404.8 = 404
We have 223 number of leaf level pages and all of them can be accommodated in just 1 page ( 223/ 404 = 0.55 = 1 ) in non-leaf level and that would be the root page as shown in Figure 2 below
Figure 2: Complete unique non clustered index b-tree structure
Now let’s discuss what will be structure of non-leaf level for non-unique non-clustered index.
Non leaf level entry for non-unique non-clustered index
= Non clustered index column(s) + Pointer to next level pages + row overhead + lookup ID
= Non clustered index column(s) + (fileID + pageID) + row overhead + (clustering key OR RID of Heap)
We are not going to calculate size of non-leaf level pages of non-unique non-clustered index but the important part is introduction of lookup id. Why we need look up id? To understand the concept let’s first try to understand how unique non-clustered index is used for searching data or updating the index. Suppose we want to know the name of the student having SSN 225-69-8784. To find SSN 225-69-8784 we will follow following steps sequentially
-
Look into root page and come to know that SSN 225-69-8784 is in page 3002 of file 1.
-
Look into page 3002 of file 1 in leaf level and come to know studentID of the student having SSN 225-69-8784 is 33 if the table is clustered one If the table is not clustered then instead of studentID we will get an RID( row identifier ).
-
Look into balanced tree of clustered index and find name of student having studentID 33. If the table is not clustered then scan the heap pages to find name of the student having corresponding RID (row identifier).
What happens if we want to delete a student having SSN 148-86-5875? The actual data will be deleted in data pages in clustered index or heap, at the same time our non-clustered index will be updated following below steps.
-
Look into root page and come to know that SSN 148-86-5875 is in page 3001 of file 1.
-
Look into page 3001 of file 1 in leaf level, find SSN 148-86-5875. Delete the record.
From above two scenarios, we can see that it very important to reach to a particular record in leaf level pages using root and intermediate level pages during any index operation. Thus the root page and intermediate pages is for navigational purpose and the leaf level pages have some kind of pointer (clustering key or RID) to actual data. And it’s quite easy to navigate to a particular record in leaf level pages for unique non-clustered index since in every level (be it a root level or intermediate level), SSN number is unique. So what happens if the non-clustered index is non-unique? In leaf level pages we can identify any particular record since the combination of value of non-clustering column and clustering key (or RID) is always unique. The uniqueness is maintained in leaf level pages but not in root and intermediate level pages and to force this uniqueness SQL server introduce the appearance of clustered key or RID if heap table (lookup ID) is in the root and intermediate level
Let us take an example to demonstrate non-unique non-clustered index. Suppose we introduce a gender column in the Student table and create a non-unique non-clustered index “IX_Student_gender” on column gender. The balance tree for this as shown below (roughly) in Figure 3. This might be a very bad non-unique non-clustered index in real life situation but serves our purpose.
Suppose we deleted a female student (having studentID 451) from the table and our “IX_Student_gender” has to be updated for this deletion. SQL Server is going to perform following steps to update our index after deleting actual record from table.
-
Get the studentID (or RID) of that female student.
-
Searching root pages for (Female, 451) SQL Server comes to know that the student information is present in page 3002 of file 1 and the. Figure 3.
-
Delete the record from Page 3002 of file 1.
Figure 3 : non unique non clustered index balanced tree
So far we have discussed all the non-clustered index on a single column. Now let us discuss key structure in leaf and non-leaf level pages in case of composite clustering key and non-clustered index on composite column.
Let’s say our new table has columns A1, A2, A3, B1, B2, and B3. We have one unique clustered index and two non-clustered indexes on the table defined as
-
IX_Unique_Clustered_Index(A1,A2)
-
IX_Unique_NonClustered_Index(A1,B1,B2)
-
IX_NonUnique_NonClustered_Index(B1,B2,B3)
For index “IX_Unique_NonClustered_Index(A1,B1,B2)”
Leaf level page entry = Non Clustered index columns(s) + Clustering Key OR fixed RID(if Heap table)
+ Row over head
= (A1, B1, B2) + (A1, A2) + Row Over Head
= (A1, A2, B1, B2) + Row overhead
A1 appears twice in leaf level pages but SQL Server will keep only one.
Non leaf level entry = Non clustered index column(s) + Pointer to next level pages + row overhead
= (A1, B1, B2) + Pointer to next level pages + row overhead
For index “IX_NonUnique_NonClustered_Index(B1,B2,B3)”
Leaf level page entry = Non-clustered index columns(s) + Clustering Key OR fixed RID (if Heap table)
+ Row over head
= (B1, B2, B3) + (A1, A2) + Row Over Head
= (A1, A2, B1, B2, B3) + Row overhead
Non leaf level entry = Non-clustered index column(s) + Pointer to next level pages + row overhead + lookup ID
= (B1, B2, B3) + Pointer to next level pages + row overhead + (A1, A2)
= (A1, A2, B1, B2, B3) + Pointer to next level pages + row overhead
Summary
Non-clustered indexes are great to improve any search query performance if they are crafted judiciously. It reduces the disk IO significantly thus improving the performance. While creating a non-clustered index we should always keep in mind that we are going to duplicated same number of rows that the table have, at leaf level pages of non-clustered index but with only few columns [ non-clustered index column(s) + Clustering Key OR fixed RID + row overhead] data. So never have too many non-clustered indexes in a table. Remember every insert and delete on a table must touch all non-clustered index to keep them updated. It’s always good to have no non-clustered index on a table instead of a bad one. Creating a good non clustered indexes is continuous process and it cannot be done overnight.
- Summation IT
- Wednesday 24 January 2018 |
- Database Development |
Categories
Recent Posts
Common Vulnerabilities in Web Applications:
Security in a website is the most important factor needs to be taken care of if considered. Are your website and user data safe and secure?Android Marshmallow runtime permissions
Security in a website is the most important factor needs to be taken care of if considered. Are your website and user data safe and secure?SignalR – Why, What and How?
An increasing number of software out there namely websites and web applications today offer or need to offer real-time dataIndexing in SQL Server
An increasing number of software out there namely websites and web applications today offer or need to offer real-time data
