Does Map-based deduplication impact performance on lists with 10k+ items?

Map lookups are O(1) regardless of size. The main bottleneck is Array.from() reconstruction. For very large lists pair this with a virtualized renderer like @tanstack/react-virtual and set gcTime conservatively to avoid holding the full entity graph in memory.

Can I apply this merge pattern with offset-based pagination?

Yes, but offset pagination lacks deterministic boundaries. You must validate item indices and enforce strict ID uniqueness checks to prevent overlap during network retries. Cursor-based pagination is strongly preferred for infinite scroll because cursors remain stable under concurrent writes.

Merging Paginated Lists Without Duplicates

Q: How do I handle server-side duplicates that legitimately share an ID?

If the server intentionally returns duplicate IDs across pages, implement a composite key strategy (e.g. `${id}-${categorySlug}`) or defer to server-side pagination guarantees. Client-side merging assumes global ID uniqueness within the entity type.

Naive array concatenation ([...existing, ...fetched]) injects duplicate entities into the client cache whenever page boundaries overlap — a common occurrence during rapid scrolling, network retries, or optimistic mutations. The visible result is UI flickering, list items jumping position, and React reconciling duplicate DOM nodes it cannot reuse. This page shows you how to eliminate that class of bug using Map-based entity registries, and is a concrete recipe within Pagination Normalization Patterns. If you are dealing with cursor token corruption specifically, see Normalizing Cursor-Based Pagination for the complementary guard against cursor drift.

The root cause is always the same: merge logic that treats overlapping page boundaries as distinct entities, bypassing the Data Normalization & Query Key Design principle of a single authoritative entity record per ID.

Diagnostic Checklist

Before writing any code, verify that your symptoms match this failure mode:

List items flash or jump during a scroll pause, even though no data actually changed on the server.
React DevTools “Highlight Updates” shows repeated re-renders on list items that were already visible.
Two concurrent fetchNextPage calls with the same or adjacent cursors both resolve and write to the cache.
console.log(data.pages.flatMap(p => p.items).length) is larger than new Set(data.pages.flatMap(p => p.items).map(e => e.id)).size.
Apollo InMemoryCache or TanStack Query DevTools shows the same entity ID appearing in multiple page slots.

Page boundaries that overlap on item C produce a duplicate under naive concat; a Map registry absorbs the second write as a no-op.

Step-by-Step Implementation

Step 1 — Build the Map-based merge utility

Start with a framework-agnostic utility you can wire into any fetcher. This is the core algorithm; everything else is glue.

// lib/merge-pages.ts
export interface PaginatedPage<T extends { id: string | number }> {
  data: T[];
  hasNextPage: boolean;
  endCursor: string | null;
}

/**
 * Idempotent merge: existing entities are never overwritten;
 * novel entities are appended in server-delivery order.
 * Pagination metadata is always taken from the latest page.
 */
export function mergePaginatedPages<T extends { id: string | number }>(
  existing: PaginatedPage<T>,
  incoming: PaginatedPage<T>,
): PaginatedPage<T> {
  // O(N) build of registry from existing entities — O(1) lookup thereafter
  const registry = new Map<string | number, T>(
    existing.data.map((entity) => [entity.id, entity]),
  );

  for (const item of incoming.data) {
    if (!registry.has(item.id)) {
      registry.set(item.id, item);
    }
    // Existing record wins: no overwrite, preserving structural sharing
  }

  return {
    data: Array.from(registry.values()),
    // Metadata always from the latest payload — never merge with stale cache values
    hasNextPage: incoming.hasNextPage,
    endCursor: incoming.endCursor,
  };
}

Cache Behavior Analysis. Because existing entities are never overwritten, TanStack Query’s structuralSharing can reuse their object references across renders. React’s reconciler sees the same reference for unchanged list items, skips their re-render, and only mounts/updates the truly new nodes. Without this, naive spread produces new object references for every item on every page fetch, causing every visible list item to re-render.

Step 2 — Wire into TanStack Query v5 `useInfiniteQuery`

TanStack Query v5 uses initialPageParam and getNextPageParam; the select transform flattens pages into a deduplicated list.

// hooks/use-feed.ts
import { useInfiniteQuery } from '@tanstack/react-query';
import { mergePaginatedPages, PaginatedPage } from '../lib/merge-pages';

interface FeedItem {
  id: string;
  title: string;
  publishedAt: string;
}

async function fetchFeedPage(cursor: string | null): Promise<PaginatedPage<FeedItem>> {
  const params = new URLSearchParams({ limit: '20' });
  if (cursor) params.set('after', cursor);
  const res = await fetch(`/api/feed?${params}`);
  if (!res.ok) throw new Error('Network error');
  return res.json();
}

export function useFeed() {
  return useInfiniteQuery({
    queryKey: ['feed'],
    queryFn: ({ pageParam }) => fetchFeedPage(pageParam),
    initialPageParam: null as string | null,
    getNextPageParam: (lastPage) =>
      lastPage.hasNextPage ? lastPage.endCursor : undefined,
    select(data) {
      // Fold all pages into a single deduplicated list using the Map registry
      return data.pages.reduce<PaginatedPage<FeedItem>>(
        (merged, page) => mergePaginatedPages(merged, page),
        { data: [], hasNextPage: false, endCursor: null },
      );
    },
    staleTime: 30_000,
    gcTime: 5 * 60_000,
    // structuralSharing defaults to true — keep it on so unchanged entities reuse references
  });
}

Cache Behavior Analysis. The select function runs after every successful fetch but is memoized by TanStack Query — it only re-executes if data.pages reference changes. The reduce over pages re-runs in full on each call, which is acceptable because Map construction is O(N) and page counts are typically small (< 30 pages of 20 items). For very deep scroll sessions you can memoize the accumulated registry outside the query with a useRef.

Step 3 — Apollo Client v3 `typePolicies` integration

Apollo’s InMemoryCache can be configured with a per-field merge function that applies the same Map-based algorithm. This replaces the default behavior of returning only the latest page.

// apollo/cache.ts
import { InMemoryCache, Reference } from '@apollo/client';

export const cache = new InMemoryCache({
  typePolicies: {
    Query: {
      fields: {
        feedItems: {
          // Treat the cursor as part of the cache key so each cursor
          // maps to an independent cache slot — prevents cross-cursor pollution
          keyArgs: ['filter'],
          merge(
            existing: { items: Reference[]; endCursor: string | null } = {
              items: [],
              endCursor: null,
            },
            incoming: { items: Reference[]; hasNextPage: boolean; endCursor: string | null },
            { readField },
          ) {
            // Build registry keyed on the normalised entity ID from the store
            const registry = new Map<string, Reference>();
            for (const ref of existing.items) {
              const id = readField<string>('id', ref);
              if (id) registry.set(id, ref);
            }
            for (const ref of incoming.items) {
              const id = readField<string>('id', ref);
              if (id && !registry.has(id)) {
                registry.set(id, ref);
              }
            }
            return {
              items: Array.from(registry.values()),
              hasNextPage: incoming.hasNextPage,
              endCursor: incoming.endCursor,
            };
          },
        },
      },
    },
  },
});

Cache Behavior Analysis. Apollo normalizes each Reference by its __typename:id key before calling merge. That means two pages returning the same item already point to the same store slot — but without this merge function, Apollo would still include the reference twice in the field’s array. The Map registry prevents the duplicate reference, so readQuery and useQuery both see a clean list with no phantom entries.

Edge Cases and Gotchas

Cursor drift under concurrent writes

When a server-side write (e.g. a new post) shifts the underlying dataset, the cursor from page 1 may now overlap with what page 2 already returned. The merge registry handles the duplicate silently, but the list order becomes inconsistent. Guard against this by comparing the server-returned total count with your registry size:

if (incoming.total !== undefined && incoming.total < registry.size) {
  // Server dataset shrank — stale entries exist; trigger a full refetch
  queryClient.invalidateQueries({ queryKey: ['feed'] });
}

Optimistic mutations leaving phantom entries

When you insert an item optimistically (before server acknowledgment), tag it so the merge registry can identify and replace it when the real response arrives:

// Optimistic insert
const phantom: FeedItem = { id: `__optimistic__${Date.now()}`, title: 'Saving…', publishedAt: '' };

// On success, invalidate so the real entity displaces the phantom
await queryClient.invalidateQueries({ queryKey: ['feed'] });

Avoid using non-deterministic IDs that might collide with real server IDs — prefix them with a sentinel like __optimistic__ so the registry treats them as distinct from any valid entity.

`gcTime` expiry mid-scroll

If a user scrolls slowly enough that TanStack Query’s gcTime (default 5 minutes) elapses between page fetches, the cache for early pages is garbage collected. On the next fetchNextPage call the accumulation restarts from an empty registry, potentially re-fetching pages 1–N. Set gcTime to a value comfortably exceeding your expected scroll session length:

gcTime: 30 * 60_000, // 30 minutes for long-lived infinite scroll

Common Pitfalls and Resolutions

Observable Issue	Root Cause	Diagnostic Resolution
List items flash and jump during scroll pauses	`[...existing, ...fetched]` creates new object references for all items on every page fetch; React reconciles every node	Replace with the `Map` registry — existing references are preserved, only novel entities get new references
Pagination cursor resets to page 1 after a merge	`merge` function overwrites `endCursor` with the stale initial query value rather than the latest page’s cursor	Always take `hasNextPage` and `endCursor` from `incoming`, never spread them from `existing`
Concurrent fetches insert overlapping page data	Two `fetchNextPage` calls resolve simultaneously with adjacent cursors; both writes reach the cache before deduplication	The Map registry makes concurrent writes idempotent — configure `maxPages` in TanStack Query v5 to limit in-flight page count, or use a ref to gate concurrent calls

Frequently Asked Questions

How do I handle server-side duplicates that legitimately share an ID?

If the server intentionally returns the same entity ID across pages (e.g. a shared resource appearing in multiple categories), implement a composite key: ${entity.id}-${categorySlug}. Client-side merging assumes global ID uniqueness within an entity type. Alternatively, normalise at the fetch boundary so that each entry gets a synthetic unique key before the registry sees it.

Does Map-based deduplication degrade with 10k+ items?

Map insertion and lookup are O(1). Array.from(registry.values()) is O(N) but runs once per fetch. The real performance concern at very large list sizes is DOM node count, not the registry. Pair this merge with a virtualised list (@tanstack/react-virtual) and set gcTime conservatively to avoid retaining the full entity graph in memory between sessions.

Does this work with offset-based pagination?

Yes, with caveats. Offset pagination lacks stable boundaries — a write at position 5 shifts every subsequent item’s offset, meaning the “same” offset on a retry may return different entities. The Map registry still deduplicates by ID, but list ordering may become inconsistent. For infinite scroll specifically, cursor-based pagination is strongly preferred because cursors remain stable under concurrent writes, eliminating the cursor drift edge case described above.

Pagination Normalization Patterns — the parent reference covering offset vs cursor adapter design, query key hashing, and merge semantics across the full pagination surface.
Normalizing Cursor-Based Pagination — guards specifically against cursor token corruption and race-condition-induced cursor drift, complementing the merge strategy on this page.
Data Normalization & Query Key Design — foundational pillar covering entity mapping, relationship stitching, and query key design patterns that underpin every pagination merge decision.
Entity Mapping Strategies — how to design stable entity ID contracts and normalisation depth so that the Map registry on this page has a reliable key to key on.