Simplifying Pull Request Reviews with OpenAI and GitHub Actions

Michaël Scherding
9 min readJul 10, 2023

--

Simplify pull request reviews with the power of OpenAI’s ChatGPT API and GitHub Actions. Automate code analysis, receive valuable recommendations and expedite the review process with ease.

Introduction

As a tech lead, one of the challenges you often face is reviewing numerous pull requests from engineers seeking your expertise. This process can be time-consuming and mentally demanding, requiring you to meticulously examine each line of code. To simplify this process and alleviate some of the burdens, I have developed a code reviewer based on OpenAI and leveraged the power of GitHub Actions. In this article, we will explore how this innovative solution streamlines pull request reviews, allowing you to focus on the most critical aspects of code evaluation.

Section 1: Introducing the Code Reviewer with OpenAI

To address the challenges of pull request reviews, I have developed a code reviewer powered by OpenAI’s ChatGPT API, specifically utilizing ChatGPT-3.5-turbo. This intelligent assistant acts as your code review buddy, providing an initial level of code analysis and summarization. ChatGPT-3.5-turbo offers fast response times, enabling quick turnarounds for pull request reviews.

Let’s explore the functions within the code_review.py script:

get_file_content(file_path):

def get_file_content(file_path):
"""
This function reads the content of a file.

Args:
file_path (str): The path to the file.

Returns:
str: The content of the file.
"""
with open(file_path, 'r') as file:
return file.read()

This function reads the content of a file specified by file_path and returns its content as a string. It is used to fetch the content of each changed file in the pull request.

get_changed_files(pr):

def get_changed_files(pr):
"""
This function fetches the files that were changed in a pull request.

Args:
pr (PullRequest): The pull request object.

Returns:
dict: A dictionary containing the file paths as keys and their content as values.
"""
# Clone the repository and checkout the PR branch
repo = git.Repo.clone_from(pr.base.repo.clone_url, to_path='./repo', branch=pr.head.ref)

# Get the difference between the PR branch and the base branch
base_ref = f"origin/{pr.base.ref}"
head_ref = f"origin/{pr.head.ref}"
diffs = repo.git.diff(base_ref, head_ref, name_only=True).split('\n')

# Initialize an empty dictionary to store file contents
files = {}
for file_path in diffs:
try:
# Fetch each file's content and store it in the files dictionary
files[file_path] = get_file_content('./repo/' + file_path)
except Exception as e:
print(f"Failed to read {file_path}: {e}")

return files

Given a pull request object pr, this function fetches the files that were changed in the pull request. It clones the repository, checks out the pull request branch, and determines the differences between the pull request branch and the base branch. The function returns a dictionary with file paths as keys and their content as values.

send_to_openai(files):

def send_to_openai(files):
"""
This function sends the changed files to OpenAI for review.

Args:
files (dict): A dictionary containing the file paths as keys and their content as values.

Returns:
str: The review returned by OpenAI.
"""
# Concatenate all the files into a single string
code = '\n'.join(files.values())

# Split the code into chunks that are each within the token limit
chunks = textwrap.wrap(code, TOKEN_LIMIT)

reviews = []
for chunk in chunks:
# Send a message to OpenAI with each chunk of the code for review
message = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{
"role": "user",
"content": "You are assigned as a code reviewer. Your responsibility is to review the provided code and offer recommendations for enhancement. Identify any problematic code snippets, highlight potential issues, and evaluate the overall quality of the code you review:\n" + chunk
}
],
)

# Add the assistant's reply to the list of reviews
reviews.append(message['choices'][0]['message']['content'])

# Join all the reviews into a single string
review = "\n".join(reviews)

return review

This function sends the changed files to OpenAI for review using the ChatGPT-3.5-turbo model. It concatenates all the file contents into a single string and splits it into chunks that are within the token limit. Each chunk is sent as a message to the ChatGPT API, and the responses from the assistant are collected and joined into a single review string. The review is returned by the function.

post_comment(pr, comment):

def post_comment(pr, comment):
"""
This function posts a comment on the pull request with the review.

Args:
pr (PullRequest): The pull request object.
comment (str): The comment to post.
"""
# Post the OpenAI's response as a comment on the PR
pr.create_issue_comment(comment)

Given a pull request object pr and a comment string, this function posts the comment as a comment on the pull request. It utilizes the GitHub API to create an issue comment.

main():

def main():
"""
The main function orchestrates the operations of:
1. Fetching changed files from a PR
2. Sending those files to OpenAI for review
3. Posting the review as a comment on the PR
"""
# Get the pull request event JSON
with open(os.getenv('GITHUB_EVENT_PATH')) as json_file:
event = json.load(json_file)

# Instantiate the Github object using the Github token
# and get the pull request object
pr = Github(os.getenv('GITHUB_TOKEN')).get_repo(event['repository']['full_name']).get_pull(event['number'])

# Get the changed files in the pull request
files = get_changed_files(pr)

# Send the files to OpenAI for review
review = send_to_openai(files)

# Post the review as a comment on the pull request
post_comment(pr, review)

if __name__ == "__main__":
main() # Execute the main function

The main() function orchestrates the operations of fetching changed files from a pull request, sending those files to OpenAI for review, and posting the review as a comment on the pull request. It reads the pull request event details, gets the pull request object, calls the get_changed_files function to fetch the changed files, and sends the files to OpenAI using the send_to_openai function, and finally posts the review as a comment using the post_comment function.

Section 3: Leveraging GitHub Actions for Automation

GitHub Actions provides a powerful framework for automating various aspects of the software development lifecycle, including code review. By setting up a workflow using a .yml file, you can effortlessly trigger the code reviewer whenever a new pull request is submitted. This automation saves you time and ensures consistent code review practices across your team.

Here’s a snippet of the code_review.yml file, which sets up the GitHub Actions workflow:

name: Code Review with OpenAI

on:
pull_request:

jobs:
review:
runs-on: ubuntu-latest

steps:
- name: Checkout repository
uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: 3.9

- name: Install dependencies
run: |
pip install openai PyGithub GitPython

- name: Run review
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
GITHUB_TOKEN: ${{ secrets.REPO_ACCESS_TOKEN }}
run: python .github/actions/code_review.py

Section 4: Examples

To demonstrate the effectiveness of the code reviewer, let’s examine an example where we applied the code_review.py logic to itself. By doing so, we can gain insights into the recommendations provided by ChatGPT and evaluate its effectiveness in identifying potential enhancements. Below, you will find the review generated by the code reviewer for the code_review.py script:

Overall, the code seems to be well-structured and follows proper coding conventions. However, I have identified a few areas where enhancements can be made:

Error Handling: The get_changed_files function catches exceptions when reading file contents but only prints the error message. It would be better to log the exceptions or handle them more gracefully. Additionally, it would be useful to specify the type of exception that occurred.

Logging or Error Reporting: Instead of directly printing the error message in get_changed_files, consider using a logging framework or reporting the error in a more consistent manner. This would make it easier to track and investigate any errors that occur during file reading.

Exception Handling: The catch-all exception handler in get_changed_files can hide potential issues and make debugging difficult. It is recommended to catch specific exceptions rather than using a general Exception class.

Variable Names: The variable names used in the code are mostly clear and descriptive. However, the name pr used in get_changed_files and post_comment could be more explicit. Consider using a more descriptive name, such as pull_request.

Code Repetition: The code to create an instance of the Github class is duplicated in both main and get_changed_files functions. It would be better to create the Github object once and pass it as an argument to the respective functions.

String Concatenation: The use of string concatenation in get_file_content to build the path can lead to issues, especially on different operating systems. It is recommended to use the os.path module to join path components.

Code Comments: The existing comments in the code are generally helpful. It would be beneficial to include more comments explaining the purpose and functionality of each function for better code comprehension.

API Key Management: The API key for OpenAI is directly loaded from the environment, which can pose security risks in certain scenarios. Consider using a secure key management solution or storing the key securely.

Testing: There are no tests provided for the code. It is recommended to write tests to cover the functionality of each function, especially for critical parts like fetching changed files and sending them for review.

Error Handling for API Call: The send_to_openai function makes API calls to OpenAI, but it does not handle any errors that may occur during the call. It is advisable to implement proper error handling and fallback mechanisms.

Code Modularity: The main function currently performs multiple operations. Consider breaking it down into smaller functions with clear responsibilities to enhance code modularity and maintainability.

Input Validations: The code assumes that the required environment variables (OPENAI_API_KEY, GITHUB_EVENT_PATH, GITHUB_TOKEN) will be available. It would be beneficial to validate the presence and correctness of these variables before proceeding with the rest of the code.

Code Formatting: While the majority of the code adheres to PEP 8 conventions, there are a few areas where the formatting can be improved for better readability and consistency.

Error Handling in main: Currently, if there is an exception while executing the code in main, it will most likely terminate without any indication of the error. It is recommended to implement error handling and provide appropriate feedback in such cases.

The review was performed in under 47 seconds:

And review looks like this:

Section 5: Ensuring Code Sensitivity and Privacy

While leveraging the power of the ChatGPT API and external services like OpenAI can be incredibly beneficial, it’s crucial to exercise caution when dealing with sensitive code. The ChatGPT model processes data externally, which means the code you submit for review is shared with the OpenAI infrastructure. It’s essential to ensure that the code being reviewed does not contain any sensitive information that you are not comfortable sharing with an external company.

Conclusion

In conclusion, the code reviewer based on OpenAI’s ChatGPT API, powered by ChatGPT-3.5-turbo, offers a valuable first level of code review. By leveraging this intelligent assistant, you can streamline the pull request review process, saving time and effort. The code reviewer provides initial analysis and recommendations, helping you identify potential issues and improve code quality.

However, it’s important to note that the code reviewer should be considered a complementary tool rather than a substitute for thorough manual code reviews. While it offers valuable insights, it’s always recommended to perform additional code reviews, either manually or using other automated tools, to ensure comprehensive evaluation and adherence to best practices. It is important to remember that human review and judgment are irreplaceable when it comes to critical code evaluation.

Furthermore, for better results, consider optimizing the implementation of certain functions within the code, such as get_file_content and get_changed_files. Refactoring these functions can improve efficiency and streamline the review process. Additionally, crafting a more specific prompt and exploring faster programming languages can enhance accuracy and reduce waiting time. Upgrading both the code and the prompt maximizes the effectiveness of the review process, saving time and ensuring high-quality code.

By leveraging the code reviewer, along with additional review processes and incorporating feedback from the engineering team, you can enhance code quality, foster collaboration, and accelerate the development lifecycle.

Please find the GitHub repo here.

Have fun 🤟

--

--

Responses (1)