Skip to content

perf(grades): optimize database queries for large-scale grade recalcu…#38787

Open
andrey-canon wants to merge 1 commit into
openedx:masterfrom
eduNEXT:and/optimize-grade-recalculation-queries
Open

perf(grades): optimize database queries for large-scale grade recalcu…#38787
andrey-canon wants to merge 1 commit into
openedx:masterfrom
eduNEXT:and/optimize-grade-recalculation-queries

Conversation

@andrey-canon

Copy link
Copy Markdown
Contributor

Description

This PR addresses significant platform performance degradation caused by inefficient database queries during large-scale course grade recalculations.

The previous implementation relied on SQL OFFSET pagination, which forced the database to perform full index scans and discard hundreds of thousands of rows for high-offset tasks. Additionally, it suffered from an $N+1$ query problem by fetching user data individually for every enrollment in a batch.

Changes

  • Keyset Pagination: Refactored _course_task_args and compute_grades_for_course to use start_id (ID-based seeking) instead of offset. This ensures O(1) database lookup performance regardless of the course size.
  • Database Optimization: Replaced order_by('created') with order_by('id') to leverage the Primary Key clustered index.
  • Eager Loading: Added .select_related('user') to the enrollment QuerySet to fetch user data in a single JOIN query, eliminating $100$ extra queries per batch.
  • Memory Efficiency: Used .values_list('id', flat=True) in the task generator to minimize memory footprint when handling courses with 400k+ enrollments.

How to Test

Run the following script in the Django shell (python manage.py lms shell) on a high-enrollment course:

from common.djangoapps.student.models import CourseEnrollment
from opaque_keys.edx.keys import CourseKey
import time
from django.db import connection, reset_queries

course_key = CourseKey.from_string("your/course/id")
batch = 100
offset_test = 440000 

# Benchmark Legacy Logic
reset_queries()
st = time.time()
enrollments_legacy = CourseEnrollment.objects.filter(course_id=course_key).order_by('created')[offset_test:offset_test + batch]
ids_legacy = [e.user.id for e in enrollments_legacy]
print(f"Legacy Time: {time.time() - st:.4f}s | Queries: {len(connection.queries)}")

# Benchmark Optimized Logic
start_id = CourseEnrollment.objects.filter(course_id=course_key).order_by('id')[offset_test].id
reset_queries()
st = time.time()
enrollments_new = CourseEnrollment.objects.filter(course_id=course_key, id__gte=start_id).select_related('user').order_by('id')[:batch]
ids_new = [e.user.id for e in enrollments_new]
print(f"Optimized Time: {time.time() - st:.4f}s | Queries: {len(connection.queries)}")

Performance Benchmarks

Metric Original (Offset + N+1) Optimized (Seek + Join) Improvement
Execution Time ~0.9614s ~0.0388s ~25x faster
DB Queries 101 1 100 fewer queries

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Jun 19, 2026
@openedx-webhooks

Copy link
Copy Markdown

Thanks for the pull request, @andrey-canon!

This repository is currently maintained by @openedx/wg-maintenance-openedx-platform.

Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review.

🔘 Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

  • If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
    • This process (including the steps you'll need to take) is documented here.
  • If it doesn't, simply proceed with the next step.
🔘 Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

  • Dependencies

    This PR must be merged before / after / at the same time as ...

  • Blockers

    This PR is waiting for OEP-1234 to be accepted.

  • Timeline information

    This PR must be merged by XX date because ...

  • Partner information

    This is for a course on edx.org.

  • Supporting documentation
  • Relevant Open edX discussion forum threads
🔘 Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.

Details
Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

  • The size and impact of the changes that it introduces
  • The need for product review
  • Maintenance status of the parent repository

💡 As a result it may take up to several weeks or months to complete a review and merge your PR.

…lations

Replaced inefficient SQL OFFSET pagination with ID-based keyset pagination to ensure consistent lookup performance, and updated ordering to leverage the primary key index. Resolved an N+1 query issue by eagerly loading user data via `.select_related('user')` and optimized memory footprint using `.values_list()`.

These changes reduce execution time by ~25x and eliminate 100 redundant queries per batch during high-enrollment course processing.
@andrey-canon andrey-canon force-pushed the and/optimize-grade-recalculation-queries branch from b72ca72 to 0c2403b Compare June 19, 2026 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

open-source-contribution PR author is not from Axim or 2U

Projects

Status: Needs Triage

Development

Successfully merging this pull request may close these issues.

2 participants