Primary Key best practices: INT identity vs. UniqueIdentifier (GUID)
I came over the forum discussion lately. Besides that I see more and more web engines implemented via UniqueIdentifier primary key (I guess), since I can see GUID inside the querystring (lame in my eyes), like in dasBlog.
So, what should I choose for my primary key ? Which datatype ? Personally, I like INT (IDENTITY). It is a small column (4 bytes with up to 2 billion), is useful for data warehousing (unlike natural keys), guarantees uniqueness, and is completely automated by the system. While you have to use 16 bytes for UniqueIdentifier.
Znichter has more:
"The GUID is a wide column (16 bytes to be specific) and contains a unique combination of 33 uppercase and numeric characters. This column because it is the primary key is going to be stored in, of course, the clustered index (unless specified to be a non-clustered index), and will be the page pointer for each leaf page in a non-clustered index. Also, if a GUID is used instead of an integer identity column then the 33 characters need to be matched for each row that is returned using that column in the where clause. If a high volume of inserts are done on these tables then GUID's being large will contribute to page splits, as will the fact that NEWID() generates a random value, which could place a new record on any of the data pages will cause performance problems."
And my 2 cents. Don't use GUID as PK (or at all) if you don't need to because:
- table row size limit in MSSQL
- storage + memory. Imagine you have tables with 10000000 rows and growing
- flexibility: there are T-SQL operators available for INT like >, <, =, etc...
- GUID is not optimized for ORDER BY/GROUP BY queries and for range queries in general
Update: here znichter posts part II on the GUID issue.
I have tested GUID selects months ago and can approve that queries perform just the same as with INTs if you work accurately. For GUID disadvantages see above.
Friday, September 2, 2005 1:57 AM