Building a Python AWS Lambda function to run AWS Redshift SQL scripts

When we are working with AWS Redshift usually we have to run a lot of updates and many of those are usually very repetitive, in addition to that Redshift doesn’t support store procedures to allow you to run and store SQL scripts. However there’s a way to solve this problem using AWS Lambda and combining it with other AWS technologies, we would be able to schedule scripts, trigger them or run one after the other. This tutorial is about a way to execute SQL script on AWS Redshift making use of AWS Lambda functions.

1. Get psycopg2 for linux inside the project

One of first problems you can encounter on AWS Lambda is the lack of the psycopg2 library, the PostgreSQL library, this library is able to run SQL queries on Redshift using Python (Redshift is based on the version 8.0.2 of the PostgreSQL database) and since the AWS Lambda function runs in a Linux environment, we need the psycopg2 library compiled for Linux (there you have it).

The psycopg2 structure for the Lambda Job

Once we have the psycopg2 library, we put it on the root of our project.

2. Add a settings.py file to the project

The setting.py file

On the settings.py file of your project you would configure the database connection (host, database name, username and password) and the path of your script, put this file on the root of your project as well.

3. Add the redshift_utils.py file

The redshift_utils.py file

The redshift_utils.py file is a python module with a couple of important classes on it, the first one is ScriptReader, this class contains a function to gather the script content and return it as a string value, and RedshiftDataManager contains two important functions, run_query(script, connection) and run_update(script, connection). The run_query(script, connection) function would run a script file as a query and would return the list of results back as a tuple, the run_update(script, connection) function would do the same thing but instead of a query it would run an update (alter, delete and update statements) returning True if successful and False if not, for the connection parameter you send the DB_CONNECTION constant you specified on the settings.py file.

4. Add the lambda_function.py obligatory file

The lambda_function.py file

To use the run_update(script, connection) function you need to import the ScriptReader and RedshiftDataManager classes from the redshift_utils.py file, the SCRIPT_PATH and the DB_CONNECTION constants from the settings.py file.

5. Include a script.sql file

The script.sql file

The script.sql file contains the SQL code you want to run on AWS Redshift, you can add the {} placeholders to the script.sql file to parametrize it if you need to, also you can add more SQL scripts and manage several updates in the same lambda function using the run_update(script, connection) and the run_query(script, connection) with different script paths.

After adding all the files to the project, the file structure should look a little bit like this:

The final file structure

After the project is finished, you can compress your files and upload the compressed package to AWS Lambda through the “Code entry type” > “Upload a .zip file” option.

AWS Lambda, uploading the function code

After that you only have to hit the “Save” button and the function will be loaded, perform some additional tests on the AWS Lambda function to make sure is working as expected.

Combining AWS Lambda with AWS CloudWatch you could create scheduled updates for your database in the same way stored procedures work, you could also use AWS Step Functions to concatenate scripts one after the other to create a complex ETL processes.

Senior Software Engineer at Huge Inc. https://github.com/64lines

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store