In Real world enterprise applications, we often perform long running batch operations for huge number of records. Typically, we load the data upfront from database one time and load it into memory via datatable or in some other forms (main table and other supported tables) and then loop through individual records in Parallel For Each for better performance. However, in some cases, if the data load is huge, we may end up having memory issues loading all the records upfront so we load each individual records inside the loop in sql which could potentially affects the performance of the job.
In order to handle this scenario,we have to address avoiding memory exceptions and also not to read the data for each record in sql. The SQL paging concept comes in handy to address this issues by fetching rows in slice with some limit (eg: 20K rows at a time) and perform the parrellel operations in the loop.
In SQL Server 2012, Microsoft introduced OFFSET and FETCH keyword.aspx) to apply the paging in SQL query results. We will loop for every 20k records and perform the parrellel operations instead of individual records.
SELECT First Name + ' ' + Last Name FROM Employees
This will skip the first 10 rows and return the next 5 rows.
- ORDER BY is mandatory to use OFFSET and FETCH clause.
- OFFSET clause is mandatory with FETCH. You can never use, ORDER BY … FETCH.
- TOP cannot be combined with OFFSET and FETCH in the same query expression.
- The OFFSET/FETCH rowcount expression can be any arithmetic, constant, or parameter expression that will return an integer value. The rowcount expression does not support scalar sub-queries.
In the real world application, i will be using it like
int startIndex = 0
int offset = 20000
//Execute the SQL query to load the data by passing the startIndex and Offset.
//SELECT * FROM PERSON order by PERSON_ID OFFSET @STARTINDEX ROWS
//FETCH NEXT @OFFSET ROWS ONLY
if(no rows) break; // Break the loop since no rows to process
System.Threading.Tasks.ParallelOptions po = new System.Threading.Tasks.ParallelOptions();
po.MaxDegreeOfParallelism = MAX_THREAD_LIMIT
System.Threading.Tasks.Parallel.ForEach(queryData, po, row =>
startIndex = startIndex + offset + 1;