Back to blog

Handling Large Datasets & High-Traffic Queries: Optimizing Pagination, Sorting & Filtering

Feb 22, 2025
4 min read

Handling Large Datasets & High-Traffic Queries: Optimizing Pagination, Sorting & Filtering

Performance optimisation becomes a critical challenge when dealing with millions of records in a database. As datasets grow, queries that…


Handling Large Datasets & High-Traffic Queries: Optimizing Pagination, Sorting & Filtering

Performance optimisation becomes a critical challenge when dealing with millions of records in a database. As datasets grow, queries that once ran smoothly start slowing down, and when multiple users access the system simultaneously, things get even worse.

The Challenge: Why Do Queries Slow Down?

Even with proper indexing, large-scale applications often face the following issues:

  • Pagination Performance Drops: Offset-based pagination (OFFSET n LIMIT m) forces the database to scan and discard rows before returning results, making deeper pages increasingly slower.
  • Sorting and Filtering Lag: Queries that require sorting by timestamps, prices, or statuses can still become slow, even when indexed.
  • Database Overload Due to High-Traffic Queries: If users frequently access high-activity data, like tracking transactions for a busy wallet, caching doesn’t always help because the data is constantly changing.

To ensure fast and scalable queries, we implemented the following optimizations.

Optimizing Pagination for Large Datasets

Problem:

Offset-based pagination (OFFSET n LIMIT m) requires the database to scan and discard n rows before fetching results. The deeper the page, the worse the performance. This is especially problematic for applications handling real-time blockchain transactions, financial records, or high-traffic APIs.

Solution: Cursor-Based Pagination

Instead of using OFFSET, we switched to cursor-based pagination, which fetches records directly from the last seen record. By using a unique, indexed column (e.g., transaction_id or timestamp), we eliminate unnecessary row scans, making pagination significantly faster.

Why it works: ✅ Reduces deep pagination lag ✅ Prevents scanning & discarding large data chunks ✅ Keeps API responses consistently fast

Enhancing Sorting & Filtering Performance

Problem:

Sorting and filtering transactions based on fields like timestamp, status, or amount can become slow, even when indexes are applied. Traditional indexes work well for some queries but struggle with efficiently filtering large datasets.

Solution: Partial Indexes & Materialized Views

  • Partial Indexes: Instead of indexing an entire table, we created partial indexes for frequently used filters, such as transactions with status = 'SUCCESS'.
  • Materialized Views: For common queries (e.g., “latest transactions”), precomputing results using materialized views significantly improved query performance.

Why it works: ✅ Reduces database load for frequent filtering ✅ Improves query execution times by fetching precomputed data ✅ Keeps results fresh with periodic updates

Handling High-Traffic Queries on Frequently Changing Data

Problem:

When users check frequently updated data (like an active wallet address), caching alone isn’t a viable solution. Since the data changes constantly, relying on Redis or Memcached can lead to outdated information.

Solution: Hybrid Caching + Event-Driven Updates

To handle this challenge efficiently, we combined multiple strategies:

  1. Hybrid Caching Strategy: Instead of long-term caching, we used short-lived caches (5–10 seconds) along with background updates to reduce direct database hits.
  2. Read Replicas: Queries were offloaded to read replicas, ensuring that high-volume reads didn’t overwhelm the primary database.
  3. Event-Driven Updates (CDC — Change Data Capture): Instead of polling the database repeatedly, we implemented CDC using Kafka/PostgreSQL logical replication to push real-time updates whenever new transactions were recorded.
  4. Optimized Query Execution: Instead of fetching all transactions for an active wallet, we optimized queries to fetch only new records since the last request, reducing dataset size and improving response time.

Why it works: ✅ Reduces database hits while keeping data fresh ✅ Handles high-traffic wallets with real-time updates ✅ Ensures scalability without compromising performance

Final Outcome & Key Takeaways

With these optimizations, we achieved:

Faster API responses (up to 80% improvement in query execution time) ✅ Smooth pagination, sorting & filtering, even with millions of records ✅ Scalable architecture that handles real-time data efficiently ✅ Balanced database load, ensuring high availability under heavy user traffic

What’s Your Experience?

Have you faced similar challenges in handling large datasets with real-time queries? What worked for you? Let’s discuss your insights in the comments below! 🚀

#DatabaseScaling #Pagination #PerformanceOptimization #HighTrafficApps #BackendEngineering

By Jatin Jain Saraf on February 22, 2025.