How To Write a Coherent and Efficient SQL Test Script as a DATA QA

November 20, 2018 Manisha Sajwan

When we start developing our (Quality Assurance) QA script, do we really consider whether the script will provide optimized results? How do we ensure the quality of our QA script? 

A Data Migration QA Tester should have the knowledge of mapping doc, transformation logic, as well as the business flow. The Mapping doc works as acceptance criteria for Data Tester. We need to create our consolidated test scripts by using this mapping doc as it contains all the information needed - like filter criteria on source data  (aka the data we need to migrate from a client’s legacy system to Salesforce). 

The mapping doc also contains information on the source database, target database, source table, target table, and all the fields from source tables that need to be mapped to target fields. In this case, source stands for a legacy system and target stands for the Salesforce organization. 

QA Testers work on large databases with billions of records and hundreds of tables and have to write scripts that generate data quickly. To achieve this and address performance and execution problems requires a governance model focused on the creation, maintenance, and formatting of QA scripts. 

The consolidated test script
 
We will examine what the consolidated test script looks like, and how it is created. We start drafting our script by using the mapping doc provided by the client.

The consolidated test script is where we select the data and the fields from the source tables. It’s also where we apply the filter criteria (provided in the client’s mapping document) to help limit source data. After that, we compare the parameterized source data with the stage data (which is our target) record-by-record.

Here is a sample mapping doc. In it you’ll find a detailed description of the source and target systems, as described below:

  • Load order in which we need to create scripts
  • Source tables and their field names with filter criteria
  • Target tables and their target fields
  • Transformation logic to fulfill our business flow and to change source data into target data 

The comparison between source and target data is performed by various SQL, DDL, DML Commands like WITH Clause, WHERE Clause, conditional operators, arithmetic operators, aggregate functions, CASE statements, conversions, joins, indices, concatenation, among others are all included in this document.
 
To understand how we create our initial Data QA script to test the data, take a look at our example of a simple test script.  

Best Practices for test scripts
 
Keep the following best practices in mind after creating a raw script: 

  1. Only retrieve the dataset you need

The mindset that ‘the more data you retrieve, the better,’ is not a great practice to follow when you have a raw script. Why? Because the query performance will suffer when a large set of unwanted data is pulled. 

The SELECT Statement 

Be as precise as possible when checking your SELECT statement. Instead of choosing the SELECT statement, retrieve only the columns you need to avoid to increase network traffic. In case of related queries, do not use two select statements. Alternately, you can change your scripts by using Joins or Views. 

In the following screenshot, we’ve chosen two fields in our SELECT command (because we only need those two fields).

The DISTINCT Clause

We use the SELECT DISTINCT statement when we want to retrieve only distinct values. However, you should avoid this because it increases a query’s execution time.

2. Use Less Data Type Conversion 

Why use huge data types when you can achieve the same result by using a smaller data type? 
We should always use the smallest data types whenever possible. When you use a huge data type, you are increasing the query execution time by adding data type conversion to your query, which causes quality and performance issues. 

Let’s see how we can use the following conversion functions and  performance differences between these two functions  by creating a sample table and filling it with 20,000 records, using the T-SQL script, as  pictured below:

Now we will use three type of conversions:

When this code is executed, it provides the following elapsed times for the three conversions listed above:

Based on this, we’ll use CAST, The CAST function in SQL converts data from one data type to another. For example, we can use the CAST function to convert numeric data into character string data since the elapsed time is comparatively low.

Heavy queries
Try to keep your script simple. The greater the number of clauses, the greater the delay in execution.

OR Operator


 
We can write the query as follows by replacing the OR operator:


 
In the above example, remove the OR operator and use the IN clause whenever you can. When we use an OR operator in our script, it means we are not using an index — an index is the data structure that accelerates the execution time of a query.

NOT Operator

NOT Operator is same as OR Operator when related to performance. It’s likely that the index is not used in NOT operator, just like with the OR operator. This will slow down your script execution.

Example: 

 
We can rewrite the above query as:

JOINS
Have you ever thought of selecting the type of Joins before applying them to your query? If you are applying JOINS on multiple tables, it’s always important to consider the order of the tables in your JOIN condition. The table with the largest dataset should be used in the last table.

Conditions Inside the On Joins
Suppose you need to use multiple conditions in your JOINS as below:

Here, we use a SELECT statement inside the LEFT JOIN. This will lead to a longer execution time. What can we do to optimize the above query? We can simply create a temporary table for the query written inside the join, which will optimize the query performance. 

Use indexes in your script
 
A Book Index makes searching fast. The logic is the same as SQL indexing, which enhances the speed and performance of your queries. We can create Indexes using one or more columns of a particular table. 

Generally, we create indexes when we’re working with a large set of data. Adding an index to a column in your table will give you a query result based on the column, faster. Indexes like the unique key index and the primary key index help prevent redundant data in your table. 
Do not use an index in DML commands. Indexes slow down DML Commands but speed up the UPDATE command if the WHERE clause is an indexed field. The following is an example is how we created indexes on columns: 

We hope you enjoyed learning some of the best practices for writing a test script. 

Need help with your testing roadmap and discovery? 

Want to learn more about how Appirio can support your CRMS needs? Our quality assurance teams are experts at writing high-performing scripts in Salesforce and are ready to help you. 

For more amazing tips and tricks on how to implement custom cloud solutions, check out the Appirio Hub

Previous Article
Partner with Appirio and Google to Automate Self-service Laptops for Your Employees
Partner with Appirio and Google to Automate Self-service Laptops for Your Employees

Appirio is an early adopter partner of Google’s new Grab and Go program!

Next Article
10 Things To Know About Big Data In Salesforce
10 Things To Know About Big Data In Salesforce

Big Data is a term used for large volumes of data. Organizations can use this data to build insights that l...