Tuesday, June 1, 2010

Why Excel COM API is a Bad Choice

In some projects you might be asked to generate reports as Excel Worksheets. This is especially true when your customer is in the financial domain. Financial guys are keen of Excel; it is Omni-present in every aspect of their job.
Disregarding if their addiction to Excel is justified or not, there is something you as a software engineer should know. To generate an Excel report your first reflex is to use the COM API.
However this might not always be the right choice.

I am going to share my experience with Microsoft.Office.Interop.Excel which is the official .Net assembly that uses Excel COM API underneath.
Two years ago one of my customers asked me to include in my project the generation of reports as Excel file. Naturally my reflex went directly into using Excel APIs. I spent over a week creating all the details needed to add into the report. At the end I launched a test on real data and it took almost 10 minutes to complete! This was way unacceptable performance.
It was really one of the most stressful moments during that project.

To solve the problem I had two choices:




  1. Search on the internet for readymade packages that generate Excel files directly and without passing through the COM APIs
  2. Or use VBA script to generate the reports.

The first option was too risky for me because that involves asking the customer to buy an unknown library, which needs to be tested and approved and it is programming model learnt. If it turned out not suitable we need to claim our money back and start searching for another one. I kept this option as the last painful resort.

The second option could be tested in just one day. I replaced the APIs calls by a code that generates VBA then loaded that VBA into Excel and executed it. Everything worked fine and the generation time dropped to 3 minutes, which was far acceptable than 10.
Everything went fine and the project ended successfully.

Still, it is not the ultimate solution. So if one day you are confronted to a similar situation, first thing you do is to include in your plan to search for and test libraries that generate Excel files. Discard and resist any attempt to use COM APIs.
The following code compares the generation and savings of 100k strings into an Excel file. The first one using Excel COM API and the second one using ExcelLibrary found on Google code.

Excel COM API
Excel._Application app = new Excel.ApplicationClass();

Excel.Workbook wb = app.Workbooks.Add(Type.Missing);
Excel.Worksheet wrk = (Excel.Worksheet)wb.Sheets.Add(Type.Missing, Type.Missing, 1, Excel.XlSheetType.xlWorksheet);
for (int r = 1; r <= 1000; r++) {

for (int c = 1; c <= 100; c++) {
wrk.Cells[r, c] = String.Format("row #{0}, col#{1}", r, c);
}
}
wb.SaveAs("c:\\testExcelApi.xls", Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Microsoft.Office.Interop.Excel.XlSaveAsAccessMode.xlExclusive, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing); wb.Close(Type.Missing, Type.Missing, Type.Missing);
app.Quit();
Marshal.ReleaseComObject(wrk);
Marshal.ReleaseComObject(wb);
Marshal.ReleaseComObject(app); app = null;
wb = null;
wrk = null;


ExcelLibrary
string file = "C:\\testExcelLib.xls";
Workbook workbook = new Workbook();
Worksheet wrk = new Worksheet("First Sheet");
for (int r = 1; r <= 1000; r++) {

for (int c = 1; c <= 100; c++) {
wrk.Cells[r, c] = new Cell(String.Format("row #{0}, col#{1}", r, c));
}
}
workbook.Worksheets.Add(wrk); workbook.Save(file);


You will be shocked to know that the first code takes 3 minutes to execute, while the second only 2 seconds. This is simply 80 times faster.
The explanation is pretty obvious; the poor performance of Excel COM API comes from the fact each API call travels through several layers of codes and frameworks which makes it time consuming. Conversely ExcelLibrary writes directly into the file without any intermediary layers.


As conclusion, it is better to keep your options wide open and check/test available libraries before venturing into the unknown.

PS. Some of the available open source libraries are ExcelLibrary, NPOI, ExML…But of course you might find others.

No comments:

Post a Comment