SQL Server: Comparing Performance of Various Insert Methods
Introduction
Executive summary
The benchmark
The best method
UNION ALL method
The worst method
Insert records in transaction
Indexes and triggers
Sample code
Conclusion
Introduction
I was writing yet another "insert those records into database" application, and I noticed that the insert performance was really poor. This was not the first time I've encountered the problem, but the first time I decided to deal with it seriously.
In this article I post results of my investigations, so my efforts are not wasted.
Executive Summary
Naively inserting records one by one via INSERT INTO
statements
is the worst performing method. Using SqlBulkCopy
is a by far superior method, it offers a performance gain of over 100 times.
SqlBulkCopy
class appears only in .NET 2.0. If you are still on .NET 1.1,
you can try a simple trick with UNION ALL
described in
Pinal Dave's blog,
which performs almost as good as SqlBulkCopy
.
The Benchmark
I was inserting 1,000 records into a simple 3-column table without indexes. The server was SQL Server 2005 running on Windows Vista Business SP1 32-bit, on a machine with two 1.6 GHz Intel Xeons E5130 and 4GB of RAM. The client was a Windows XP SP3 machine with 3.0 GHz Pentium 4 and 3GB of RAM, connected to the server via a wired 100MBit Ethernet connection.
I ran the program both on the remote client described above, and locally on the database server machine. The target table is described by the following script:
CREATE TABLE [dbo].[Quotes]( [Symbol] [char](8) NOT NULL, [QuoteDate] [datetime] NOT NULL, [Price] [decimal](10, 2) NOT NULL )
The results are summarized below. All time values are in milliseconds. Each value is an average of 10 runs.
Method | Remote Time | Local Time |
---|---|---|
Simple parameterized insert | 8581 | 8586 |
Simple not-parameterized insert | 8437 | 8619 |
Insert in transaction | 439 | 164 |
UNION ALL | 52 | 50 |
SqlBulkCopy | 64 | 45 |
The code for each method is given in the subsequent sections.
The Best Method
Based on the results above, I recommend using SqlBulkCopy
for anything bigger than a couple of inserts.
It has the best performance in the local case and second-best in the remote case. UNION ALL method is surprisingly
efficient, but it is much less structured: it involves building long SQL strings by hand, so you have to deal
with quotes in strings, potential SQL injection problems, and the like. I would use it only if there is a very good
reason to chase after those last couple of milliseconds. Here's the code for SqlBulkCopy
:
void InsertRecordsSqlBulkCopy(SqlConnection conn, int nRecords) { using (SqlBulkCopy copy = new SqlBulkCopy(conn)) { copy.DestinationTableName = "Quotes"; DataTable table = new DataTable("Quotes"); table.Columns.Add("Symbol", typeof(string)); table.Columns.Add("QuoteDate", typeof(DateTime)); table.Columns.Add("Price", typeof(decimal)); for (int i = 0; i < nRecords; ++i) { table.Rows.Add("MSFT", new DateTime(2000, 1, 1).AddDays(i), 12.3m); } copy.WriteToServer(table); } }
UNION ALL Method
This method is described in Pinal Dave's blog. The idea is very simple, yet brilliant. Instead of inserting records one by one, we insert them in batches like this:
INSERT INTO Quotes (Symbol, QuoteDate, Price) SELECT 'AAA' , 12/15/2008, 2.44 UNION ALL SELECT 'BBB' , 12/15/2008, 7.42 UNION ALL SELECT 'CCC' ,12/16/2008, 9.11
This medho performs on par or better than SqlBulkCopy
, but it involves manually generating long text strings.
In the production environment you will need to take care of string values containing quotes, formatting for dates,
and, most importantly, add protection against SQL injection.
In my book, most of the time it's not worth the trouble - just go with SqlBulkCopy
. Here's the code
for the UNION ALL method:
void InsertRecordsUsingUnionAll(SqlConnection conn, int nRecords) { Console.WriteLine("Inserting records using UNION ALL..."); const int BatchSize = 500; for (int i = 0; i < nRecords; ) { StringBuilder s = new StringBuilder(); s.Append("INSERT INTO Quotes (Symbol, QuoteDate, Price)"); for (int j = 0; j < BatchSize && i < nRecords; ++j, ++i) { if (j != 0) s.Append("UNION ALL "); DateTime date = new DateTime(2000, 1, 1).AddDays(i); s.AppendFormat("SELECT 'MSFT', {0}, 12.3", date.ToShortDateString()); } using (SqlCommand cmd = new SqlCommand(s.ToString(), conn)) { cmd.ExecuteNonQuery(); } } }
The Worst Method
The worst method is, of course, the simplest one - just insert records one by one. It performs over 100 times worse than the optimized methods! I really wish SQL Server folks had some kind of optimization there: this method is so simple, it is very tempting to use. It does not really matter for performance whether the INSERT command has parameters or not, but we do want to use parameters for better security and correctness:
void InsertRecordsOneByOne(SqlConnection conn, int nRecords) { Console.WriteLine("Inserting records one by one..."); string sql = "INSERT INTO Quotes (Symbol, QuoteDate, Price) VALUES (@symbol, @date, @price)"; using (SqlCommand cmd = new SqlCommand(sql, conn)) { for (int i = 0; i < nRecords; ++i) { cmd.Parameters.Clear(); cmd.Parameters.AddWithValue("@symbol", "MSFT"); cmd.Parameters.AddWithValue("@date", new DateTime(2000, 1, 1).AddDays(i)); cmd.Parameters.AddWithValue("@price", 12.3m); cmd.ExecuteNonQuery(); } } }
Insert Records in Transaction
For commands not specifically enrolled in a transaction, SQL Server will open an implicit transaction. This
creates significant overhead. In fact, if we wrap the insertion loop in a transaction, this improves performance
by about 20 times! But it is still not nearly as fast as SqlBulkCopy
or UNION ALL. Here's the
code:
void InsertRecordsOneByOne(SqlConnection conn, int nRecords) { Console.WriteLine("Inserting records in transaction..."); using (SqlTransaction trans = conn.BeginTransaction(IsolationLevel.ReadUncommitted)) { string sql = "INSERT INTO Quotes (Symbol, QuoteDate, Price) VALUES (@symbol, @date, @price)"; using (SqlCommand cmd = new SqlCommand(sql, conn, trans)) { for (int i = 0; i < nRecords; ++i) { cmd.Parameters.Clear(); cmd.Parameters.AddWithValue("@symbol", "MSFT"); cmd.Parameters.AddWithValue("@date", new DateTime(2000, 1, 1).AddDays(i)); cmd.Parameters.AddWithValue("@price", 12.3m); cmd.ExecuteNonQuery(); } } } }
About Indexes and Triggers
Every time you insert a record into a table, its indexes are updated and triggers are executed. This will create additional overhead. Whether this overhead is significant depends on the number of indexes and complexity of the triggers. In my benchmark the table did not have any indexes or triggers on it. I also removed all the records from the table prior to the inserts, to eliminate any size-related issues.
Sample Code
All the methods above are packaged in the InsetrPerformance
project (9KB Zip archive).
This is a console application. It accepts a command line argument: a number from zero to four, which designates the insertion method
to use. When run without parameters it uses the "insert one by one" method.
IMPORTANT! The connection string is hard coded on line 15 of Program.cs
.
You will need to change it to point to your SQL server before you can use the program.
Conclusion
As scientists use to say, every problem has an elegant, trivial, self-evident and absolutely incorrect solution.
Record insertion problem is no exception. The "natural" method is the worst, and at 8ms per record it is going to
be a drag even if your records number just in the hunderds. We can gain up to a 100 times better performance by
using the SqlBulkCopy
class specifically designed for mass insertions. Know thy framework library,
and thou shalt be rewarded.