Merging Paginated Lists Without Duplicates

When implementing infinite scroll or cursor-based pagination, naive array concatenation ([...existing, ...fetched]) frequently injects duplicate entities into the client cache. This manifests as UI flickering, stale list states, and unpredictable DOM reconciliation. By mapping these observable symptoms to underlying cache normalization failures, engineers can implement deterministic merge strategies aligned with foundational Data Normalization & Query Key Design principles. Adopting idempotent merge patterns guarantees stable cache state, prevents memory leaks, and ensures predictable rendering across network retries.

Diagnostic Focus:

  • Map UI flickering to duplicate reference injection in normalized stores
  • Trace DevTools cache snapshots to identify stale cursor boundaries
  • Implement O(1) deduplication using Map-based entity registries
  • Isolate pagination metadata (hasNextPage, endCursor) from entity arrays

Symptom-to-Root-Cause Mapping in Pagination Merges

Before applying structural fixes, isolate the exact injection point using browser profiling and cache inspectors. Duplicate injection rarely stems from the server; it originates from client-side merge logic that treats overlapping page boundaries as distinct entities.

Reproduction & Diagnostic Workflow

  1. Trigger Overlap: Scroll rapidly to force concurrent fetchNextPage calls or simulate a network retry.
  2. React DevTools Profiling: Enable Highlight Updates. Observe if list items flash repeatedly during scroll pauses. This indicates React is reconciling duplicate object references rather than reusing existing DOM nodes.
  3. Network Waterfall Analysis: Open the Network tab. Filter by XHR/Fetch. Look for overlapping cursor or offset parameters resolving within the same 200–500ms window.
  4. Cache Mutation Logs: In TanStack Query DevTools or Apollo Client Inspector, watch the queryKey hash. If the key remains identical across pages while the payload changes, the cache treats distinct pages as the same query, triggering aggressive merge conflicts.
  5. Entity ID Validation: Run a quick console assertion against the rendered list:
 const renderedIds = Array.from(document.querySelectorAll('[data-entity-id]')).map((el) => el.dataset.entityId);
 console.assert(new Set(renderedIds).size === renderedIds.length, "Duplicate DOM nodes detected");

Trade-offs: High-frequency polling increases network overhead versus manual cursor validation. Strict ID validation may mask legitimate server-side duplicates, requiring explicit composite key strategies.

Idempotent Merge Algorithms & Cache Normalization

Deterministic merging requires decoupling entity storage from pagination metadata. Modern state managers (TanStack Query, Apollo Client, RTK Query) provide merge or updater hooks, but the underlying algorithm must guarantee O(1) lookups to prevent quadratic degradation on large lists.

Production-Safe Merge Implementation

// Vanilla JS / TanStack Query / RTK Query compatible
function mergePaginatedData(existing, fetched) {
  // 1. Seed registry with existing entities (O(N) build, O(1) lookup)
  const entityMap = new Map(existing.data.map((e) => [e.id, e]));

  // 2. Append only novel entities, preserving insertion order
  fetched.data.forEach((item) => {
    if (!entityMap.has(item.id)) {
      entityMap.set(item.id, item);
    }
  });

  // 3. Decouple metadata from entity graph to prevent state corruption
  return {
    ...fetched,
    data: Array.from(entityMap.values()),
    hasNextPage: fetched.hasNextPage,
    endCursor: fetched.endCursor,
  };
}

Cache Behavior Analysis:

  • Idempotency: The Map keyed on entity.id prevents duplicate normalized references. Cache updates become deterministic, eliminating React re-renders triggered by reference inequality.
  • Metadata Isolation: Pagination boundaries (hasNextPage, endCursor) are explicitly forwarded from the latest payload. Merging these with stale cache values causes cursor regression and infinite scroll loops.
  • Framework Alignment: In TanStack Query, pass this function to select or onSuccess. In Apollo Client, configure merge: (existing = [], incoming) => mergePaginatedData({ data: existing }, { data: incoming }).

Trade-offs: Map registries introduce marginal memory overhead for extreme-scale lists (>50k entities), but outperform array .find()/.filter() iterations. Strict ordering guarantees may delay UI rendering if not paired with virtualized list components.

Edge Cases: Cursor Drift, Retries, and Stale State

Pagination boundaries shift under concurrent writes, optimistic updates, or unstable network conditions. A robust merge strategy must detect drift and enforce request deduplication.

DevTools Cache Inspection Workflow

  1. Open Chrome DevTools → Memory → Take Heap Snapshot.
  2. Filter by Entity or Cache to isolate retained object graphs.
  3. Execute performance.getEntriesByType('resource') to map network latency to cache mutation timestamps.
  4. Compare query key hashes in Apollo/RTK Query Inspector. Overlapping keys indicate identical cache slots receiving divergent payloads.
  5. Verify boundary integrity: cache.extract().ROOT_QUERY.allItems.edges.length must equal the sum of unique entity IDs, not raw array length.

Handling Production Edge Cases

  • Cursor Drift: Validate server timestamps or version vectors. If fetched.cursor < cache.endCursor, discard the payload. Implementing advanced Pagination Normalization Patterns ensures boundaries remain monotonic.
  • Concurrent Fetch Overlap: Enable query-level request cancellation or implement a fetch gate tracking the highest resolved cursor. Drop responses where response.cursor <= currentCacheBoundary.
  • Optimistic Update Duplication: When inserting locally before server acknowledgment, tag entities with __optimistic: true. Strip these tags upon successful merge, or use a WeakMap to track transient references without polluting the normalized graph.
  • Fallback Strategy: If merge integrity is compromised (e.g., data.length diverges from total by >5%), trigger a refetch() with staleTime: 0. Full refetch guarantees consistency at the cost of latency.

Trade-offs: Aggressive deduplication may drop newly inserted entities if server-side IDs are non-deterministic. Full refetch guarantees consistency but increases bandwidth and latency.

Common Pitfalls & Resolutions

Observable Symptom Root Cause Diagnostic Fix
UI flickers and list items jump during infinite scroll Array concatenation without ID validation creates duplicate references. React’s reconciliation treats them as distinct nodes. Implement Set/Map-based deduplication before cache write. Preserve structural sharing for existing items.
Pagination cursor resets to page 1 after merge Merge function overwrites hasNextPage/endCursor with stale initial query values. Explicitly extract and forward the latest cursor state from the fetched payload. Never merge cursor metadata.
Concurrent fetches inject overlapping page data Lack of request deduplication allows identical pagination requests to resolve simultaneously. Enable query cancellation or track highest resolved cursor. Drop responses with cursor <= current boundary.
Memory spikes during long scroll sessions Entity arrays are never garbage collected; detached DOM nodes retain references. Use WeakMap for transient UI state. Implement virtualized rendering (react-window/@tanstack/react-virtual).

Frequently Asked Questions

How do I handle server-side duplicates that legitimately share an ID?

If the server intentionally returns duplicate IDs across pages (e.g., shared resources in multiple categories), implement a composite key strategy (${id}-${pageCursor}) or defer to server-side pagination guarantees. Client-side merging assumes global ID uniqueness.

Does Map-based deduplication impact performance on lists with 10k+ items?

Map lookups remain O(1) regardless of size. The primary bottleneck is array reconstruction via Array.from(). For extreme-scale lists, maintain a linked-list structure or use virtualized rendering to defer DOM node creation.

Can I use this approach with offset-based pagination?

Yes, but offset pagination lacks deterministic boundaries. You must validate item indices and implement strict ID uniqueness checks to prevent overlap during network retries. Cursor-based pagination is strongly preferred for infinite scroll.

How do I verify merge integrity in automated tests?

Mock overlapping page responses. Assert that cache.extract().entities contains exactly expectedUniqueCount keys. Verify DOM node count matches the deduplicated array length, and confirm hasNextPage reflects the latest server response.