29 October 2015

Performance issue while reading/writing large excel files using OpenXML SDK

Recently we faced performance issue while reading/writing large excel files using Open XML SDK. The code used to work fine when excel file has small number of rows. But the execution time used to increase exponentially as the number of rows increases. When we analyzed the code and researched over internet, we found that the problem lies with the way we were using Open XML SDK.

There are two approaches to use Open XML SDK -
  1. Open XML SDK DOM Approach
  2. Open XML SDK SAX-Like Approach

Open XML SDK DOM Approach

This approach works by reading every cell value of every row present in excel. This is generally the standard way on using Ope nXML SDK. It works well when the number of rows in excel file are less. However as the number of rows increases, the memory utilized by these methods increases leading to performance deterioration. 

Example of DOM Approach

using (SpreadsheetDocument excelDocument = SpreadsheetDocument.Open(excelFilePathAndName, true))
{
    WorkbookPart excelWorkBook = excelDocument.WorkbookPart;
    WorksheetPart excelWorksheetPart = excelWorkBook.WorksheetParts.First();
    SheetData exceSheetData = excelWorksheetPart.Worksheet.Elements<SheetData>().First();
    foreach (Row excelRow in exceSheetData.Elements<Row>())
    {
        foreach (Cell excelCell in excelRow.Elements<Cell>())
        {
            string excelCellValue = excelCell.CellValue.Text;
            Console.WriteLine(excelCellValue);
        }
    }

}

Open XML SDK SAX-Like Approach

This approach works by utilizing methods OpenXMLReader and OpenXMLWriter of Open XML SDK. These method uses the SAX approach of reading an excel file. They read the excel as an XML tree but do not load the entire tree in memory leading to better performance. 

Example of SAX-Like Approach -  

string excelFilePathAndName = "<Path of excel file with name>";

using (SpreadsheetDocument excelDocument = SpreadsheetDocument.Open(excelFilePathAndName, true))
{
    WorkbookPart excelWorkBook = excelDocument.WorkbookPart;
    WorksheetPart excelWorksheetPart = excelWorkBook.WorksheetParts.First();
    OpenXmlReader excelReaderreader = OpenXmlReader.Create(excelWorksheetPart);
    while (excelReaderreader.Read())
    {
        if (excelReaderreader.ElementType == typeof(CellValue))
        {
            string excelCellValue = excelReaderreader.GetText();
            Console.WriteLine(excelCellValue);
        }
    }

}

To conclude, for better performance while reading large excel files, use SAX-Like approach by utilizing OpenXMLReader and OpenXMLWriter classes.

No comments: