Narrow vs Wide vs H3 Schema Performance Comparison

parquet
performance
benchmarking
h3

This page benchmarks the performance difference between narrow, wide, and wide + H3 parquet schema formats when accessing data “over the wire” via HTTP range requests in DuckDB-WASM.

1 Introduction

1.1 What are Narrow vs Wide Schemas?

The iSamples property graph data can be serialized in three different parquet formats:

Format Description File Size Row Count Sources
Narrow Stores relationships as separate edge rows (otype='_edge_') ~850 MB ~106M rows All 4 sources
Wide Stores relationships as p__* columns on entity rows ~278 MB ~20M rows All 4 sources
Wide + H3 Wide format + pre-computed H3 spatial indices (h3_res4/6/8) ~292 MB ~20M rows All 4 sources

All three formats represent the same underlying data (SESAR, OpenContext, GEOME, Smithsonian) with identical semantics. The wide format eliminates edge rows; the H3 variant adds pre-computed hexagonal spatial indices for accelerated geospatial queries.

A facet summaries file (2KB) provides pre-aggregated counts for instant facet lookups.

Data source: Cloudflare R2 (updated January 2026)

1.2 Why Performance Matters

When using DuckDB-WASM in the browser:

  • Data is fetched via HTTP range requests (206 Partial Content)
  • Only the columns and row groups needed for a query are downloaded
  • Smaller files with fewer rows = fewer bytes to transfer, faster queries

Expected speedup: Wide format should be 2-3x faster based on local benchmarks.

2 Methodology

NoteBenchmarking Approach
  • Cold run: First query (includes metadata fetch, JIT compilation) - reported separately
  • Warm runs: Runs 2-3 (metadata cached, JIT warmed up)
  • Warm median: Median of warm runs only (excludes cold run for fair comparison)
  • Sequential execution: Benchmarks run one after another, not concurrently
  • Network variability: Results will vary based on your network connection and hardware

Results are shown in real-time as benchmarks complete. Data loading only begins when you click the button.

3 Setup

3.1 Environment Info

3.2 Run Benchmarks

4 Data Validity Check

Before benchmarking, let’s confirm both schemas represent the same underlying data.

Code
validityCheck = {
    // Only run when button clicked AND databases are initialized
    if (runBenchmarks < 1 || !dbNarrow || !dbWide || !dbWideH3) return null;

    const loadingDiv = document.getElementById('loading_validity');
    const errorDiv = document.getElementById('error_display');
    if (loadingDiv) loadingDiv.hidden = false;

    try {
        // Count rows in narrow
        const narrowCount = await dbNarrow.query(`SELECT COUNT(*) as cnt FROM narrow`);
        const narrowTotal = narrowCount[0].cnt;

        // Count rows in wide
        const wideCount = await dbWide.query(`SELECT COUNT(*) as cnt FROM wide`);
        const wideTotal = wideCount[0].cnt;

        // Count rows in wide+H3
        const wideH3Count = await dbWideH3.query(`SELECT COUNT(*) as cnt FROM wide_h3`);
        const wideH3Total = wideH3Count[0].cnt;

        // Count entity types in narrow (excluding edges)
        const narrowEntities = await dbNarrow.query(`
            SELECT COUNT(*) as cnt FROM narrow
            WHERE otype != '_edge_'
        `);
        const narrowEntityCount = narrowEntities[0].cnt;

        // Count samples in all three
        const narrowSamples = await dbNarrow.query(`
            SELECT COUNT(*) as cnt FROM narrow
            WHERE otype = 'MaterialSampleRecord'
        `);

        const wideSamples = await dbWide.query(`
            SELECT COUNT(*) as cnt FROM wide
            WHERE otype = 'MaterialSampleRecord'
        `);

        const wideH3Samples = await dbWideH3.query(`
            SELECT COUNT(*) as cnt FROM wide_h3
            WHERE otype = 'MaterialSampleRecord'
        `);

        return {
            narrowTotal: narrowTotal,
            wideTotal: wideTotal,
            wideH3Total: wideH3Total,
            narrowEntities: narrowEntityCount,
            narrowSamples: narrowSamples[0].cnt,
            wideSamples: wideSamples[0].cnt,
            wideH3Samples: wideH3Samples[0].cnt,
            sampleMatch: narrowSamples[0].cnt === wideSamples[0].cnt
                && wideSamples[0].cnt === wideH3Samples[0].cnt
        };
    } catch (e) {
        if (errorDiv) {
            errorDiv.textContent = `Validity check failed: ${e.message}`;
            errorDiv.style.display = 'block';
        }
        return { error: e.message };
    } finally {
        if (loadingDiv) loadingDiv.hidden = true;
    }
}

5 Benchmark 1: Entity Count Query

This benchmark tests a simple COUNT(*) GROUP BY otype query, which requires scanning row metadata.

Code
benchmark1 = {
    // Wait for validity check to complete first (sequential execution)
    if (runBenchmarks < 1 || !validityCheck || validityCheck.error) return null;

    const loadingDiv = document.getElementById('loading_b1');
    const errorDiv = document.getElementById('error_display');
    if (loadingDiv) loadingDiv.hidden = false;

    const query = `SELECT otype, COUNT(*) as cnt FROM {table} GROUP BY otype ORDER BY cnt DESC`;
    const runs = 3;

    try {
        // Narrow benchmark
        const narrowTimes = [];
        for (let i = 0; i < runs; i++) {
            const start = performance.now();
            await dbNarrow.query(query.replace('{table}', 'narrow'));
            narrowTimes.push(performance.now() - start);
        }

        // Wide benchmark
        const wideTimes = [];
        for (let i = 0; i < runs; i++) {
            const start = performance.now();
            await dbWide.query(query.replace('{table}', 'wide'));
            wideTimes.push(performance.now() - start);
        }

        // Calculate medians (excluding cold run for warm median)
        const median = arr => {
            const sorted = [...arr].sort((a, b) => a - b);
            if (sorted.length === 2) return (sorted[0] + sorted[1]) / 2;
            return sorted[Math.floor(sorted.length / 2)];
        };
        const warmMedian = arr => {
            if (arr.length <= 1) return arr[0] || 0;
            const warm = arr.slice(1); // exclude first (cold) run
            return median(warm);
        };

        const narrowMedian = warmMedian(narrowTimes);
        const wideMedian = warmMedian(wideTimes);

        return {
            name: "Entity Count (GROUP BY otype)",
            narrowCold: narrowTimes[0],
            narrowMedian: narrowMedian,
            narrowAll: narrowTimes,
            wideCold: wideTimes[0],
            wideMedian: wideMedian,
            wideAll: wideTimes,
            speedup: narrowMedian / wideMedian
        };
    } catch (e) {
        if (errorDiv) {
            errorDiv.textContent = `Benchmark 1 failed: ${e.message}`;
            errorDiv.style.display = 'block';
        }
        return { error: e.message };
    } finally {
        if (loadingDiv) loadingDiv.hidden = true;
    }
}

6 Benchmark 2: Sample Count by Site

This benchmark counts samples per sampling site, requiring a join between samples and sites.

Query complexity: - Narrow: Requires joining through edge rows - Wide: Direct join via p__* columns

Code
benchmark2 = {
    // Wait for benchmark1 to complete first (sequential execution)
    if (runBenchmarks < 1 || !benchmark1 || benchmark1.error) return null;

    const loadingDiv = document.getElementById('loading_b2');
    const errorDiv = document.getElementById('error_display');
    if (loadingDiv) loadingDiv.hidden = false;

    // Narrow query: traverse edges to get from sample -> event -> site
    const narrowQuery = `
        WITH sample_events AS (
            SELECT
                e.o[1] as event_id,
                s.row_id as sample_id
            FROM narrow s
            JOIN narrow e ON s.row_id = e.s AND e.p = 'produced_by'
            WHERE s.otype = 'MaterialSampleRecord'
        ),
        event_sites AS (
            SELECT
                se.sample_id,
                e2.o[1] as site_id
            FROM sample_events se
            JOIN narrow e2 ON se.event_id = e2.s AND e2.p = 'sampling_site'
        )
        SELECT
            site.label,
            COUNT(*) as sample_count
        FROM event_sites es
        JOIN narrow site ON es.site_id = site.row_id
        GROUP BY site.label
        ORDER BY sample_count DESC
        LIMIT 10
    `;

    // Wide query: direct column access
    const wideQuery = `
        WITH sample_sites AS (
            SELECT
                s.row_id as sample_id,
                e.p__sampling_site[1] as site_id
            FROM wide s
            JOIN wide e ON s.p__produced_by[1] = e.row_id
            WHERE s.otype = 'MaterialSampleRecord'
              AND e.otype = 'SamplingEvent'
        )
        SELECT
            site.label,
            COUNT(*) as sample_count
        FROM sample_sites ss
        JOIN wide site ON ss.site_id = site.row_id
        WHERE site.otype = 'SamplingSite'
        GROUP BY site.label
        ORDER BY sample_count DESC
        LIMIT 10
    `;

    const runs = 3;

    try {
        // Narrow benchmark
        const narrowTimes = [];
        for (let i = 0; i < runs; i++) {
            const start = performance.now();
            await dbNarrow.query(narrowQuery);
            narrowTimes.push(performance.now() - start);
        }

        // Wide benchmark
        const wideTimes = [];
        for (let i = 0; i < runs; i++) {
            const start = performance.now();
            await dbWide.query(wideQuery);
            wideTimes.push(performance.now() - start);
        }

        const median = arr => {
            const sorted = [...arr].sort((a, b) => a - b);
            if (sorted.length === 2) return (sorted[0] + sorted[1]) / 2;
            return sorted[Math.floor(sorted.length / 2)];
        };
        const warmMedian = arr => {
            if (arr.length <= 1) return arr[0] || 0;
            const warm = arr.slice(1);
            return median(warm);
        };

        const narrowMedian = warmMedian(narrowTimes);
        const wideMedian = warmMedian(wideTimes);

        return {
            name: "Sample Count by Site (multi-join)",
            narrowCold: narrowTimes[0],
            narrowMedian: narrowMedian,
            narrowAll: narrowTimes,
            wideCold: wideTimes[0],
            wideMedian: wideMedian,
            wideAll: wideTimes,
            speedup: narrowMedian / wideMedian
        };
    } catch (e) {
        if (errorDiv) {
            errorDiv.textContent = `Benchmark 2 failed: ${e.message}`;
            errorDiv.style.display = 'block';
        }
        return { error: e.message };
    } finally {
        if (loadingDiv) loadingDiv.hidden = true;
    }
}

7 Benchmark 3: Material Type Distribution

This benchmark aggregates sample counts by material category.

Code
benchmark3 = {
    // Wait for benchmark2 to complete first (sequential execution)
    if (runBenchmarks < 1 || !benchmark2 || benchmark2.error) return null;

    const loadingDiv = document.getElementById('loading_b3');
    const errorDiv = document.getElementById('error_display');
    if (loadingDiv) loadingDiv.hidden = false;

    // Narrow query: join through edges to material concepts
    const narrowQuery = `
        SELECT
            c.label as material,
            COUNT(*) as sample_count
        FROM narrow s
        JOIN narrow e ON s.row_id = e.s AND e.p = 'has_material_category'
        JOIN narrow c ON e.o[1] = c.row_id
        WHERE s.otype = 'MaterialSampleRecord'
        GROUP BY c.label
        ORDER BY sample_count DESC
        LIMIT 10
    `;

    // Wide query: direct column access to material category
    const wideQuery = `
        SELECT
            c.label as material,
            COUNT(*) as sample_count
        FROM wide s
        JOIN wide c ON s.p__has_material_category[1] = c.row_id
        WHERE s.otype = 'MaterialSampleRecord'
          AND c.otype = 'IdentifiedConcept'
        GROUP BY c.label
        ORDER BY sample_count DESC
        LIMIT 10
    `;

    const runs = 3;

    try {
        // Narrow benchmark
        const narrowTimes = [];
        for (let i = 0; i < runs; i++) {
            const start = performance.now();
            await dbNarrow.query(narrowQuery);
            narrowTimes.push(performance.now() - start);
        }

        // Wide benchmark
        const wideTimes = [];
        for (let i = 0; i < runs; i++) {
            const start = performance.now();
            await dbWide.query(wideQuery);
            wideTimes.push(performance.now() - start);
        }

        const median = arr => {
            const sorted = [...arr].sort((a, b) => a - b);
            if (sorted.length === 2) return (sorted[0] + sorted[1]) / 2;
            return sorted[Math.floor(sorted.length / 2)];
        };
        const warmMedian = arr => {
            if (arr.length <= 1) return arr[0] || 0;
            const warm = arr.slice(1);
            return median(warm);
        };

        const narrowMedian = warmMedian(narrowTimes);
        const wideMedian = warmMedian(wideTimes);

        return {
            name: "Material Type Distribution",
            narrowCold: narrowTimes[0],
            narrowMedian: narrowMedian,
            narrowAll: narrowTimes,
            wideCold: wideTimes[0],
            wideMedian: wideMedian,
            wideAll: wideTimes,
            speedup: narrowMedian / wideMedian
        };
    } catch (e) {
        if (errorDiv) {
            errorDiv.textContent = `Benchmark 3 failed: ${e.message}`;
            errorDiv.style.display = 'block';
        }
        return { error: e.message };
    } finally {
        if (loadingDiv) loadingDiv.hidden = true;
    }
}

8 Benchmark 4: Geospatial Bounding Box Query

This benchmark counts samples within the western United States (lat 32-49, lon -125 to -104), comparing lat/lon filtering on wide format versus H3-cell-based filtering on the H3-indexed file.

Code
benchmark4 = {
    if (runBenchmarks < 1 || !benchmark3 || benchmark3.error) return null;

    const loadingDiv = document.getElementById('loading_b4');
    const errorDiv = document.getElementById('error_display');
    if (loadingDiv) loadingDiv.hidden = false;

    // Wide: baseline lat/lon bounding box
    const wideQuery = `
        SELECT COUNT(*) as cnt FROM wide
        WHERE otype = 'MaterialSampleRecord'
          AND latitude BETWEEN 32 AND 49
          AND longitude BETWEEN -125 AND -104
    `;

    // Wide+H3: H3-accelerated — first identify cells in the bbox, then filter by cells
    const h3Query = `
        WITH cells AS (
            SELECT DISTINCT h3_res4 FROM wide_h3
            WHERE latitude BETWEEN 32 AND 49
              AND longitude BETWEEN -125 AND -104
              AND otype = 'MaterialSampleRecord'
        )
        SELECT COUNT(*) as cnt FROM wide_h3
        WHERE h3_res4 IN (SELECT h3_res4 FROM cells)
          AND otype = 'MaterialSampleRecord'
    `;

    const runs = 3;

    try {
        const wideTimes = [];
        for (let i = 0; i < runs; i++) {
            const start = performance.now();
            await dbWide.query(wideQuery);
            wideTimes.push(performance.now() - start);
        }

        const h3Times = [];
        for (let i = 0; i < runs; i++) {
            const start = performance.now();
            await dbWideH3.query(h3Query);
            h3Times.push(performance.now() - start);
        }

        const median = arr => {
            const sorted = [...arr].sort((a, b) => a - b);
            if (sorted.length === 2) return (sorted[0] + sorted[1]) / 2;
            return sorted[Math.floor(sorted.length / 2)];
        };
        const warmMedian = arr => {
            if (arr.length <= 1) return arr[0] || 0;
            return median(arr.slice(1));
        };

        const wideMedian = warmMedian(wideTimes);
        const h3Median = warmMedian(h3Times);

        return {
            name: "Geospatial BBox (Western US)",
            wideCold: wideTimes[0],
            wideMedian: wideMedian,
            wideAll: wideTimes,
            h3Cold: h3Times[0],
            h3Median: h3Median,
            h3All: h3Times,
            speedup: wideMedian / h3Median
        };
    } catch (e) {
        if (errorDiv) {
            errorDiv.textContent = `Benchmark 4 failed: ${e.message}`;
            errorDiv.style.display = 'block';
        }
        return { error: e.message };
    } finally {
        if (loadingDiv) loadingDiv.hidden = true;
    }
}

9 Benchmark 5: Facet Aggregation — Full Scan vs Pre-computed Summary

This benchmark compares full-scan source aggregation on the wide file versus a pre-computed 2KB facet summary file.

Code
benchmark5 = {
    if (runBenchmarks < 1 || !benchmark4 || benchmark4.error) return null;

    const loadingDiv = document.getElementById('loading_b5');
    const errorDiv = document.getElementById('error_display');
    if (loadingDiv) loadingDiv.hidden = false;

    // Full scan: aggregate source counts from full wide file
    const fullScanQuery = `
        SELECT n, COUNT(*) as cnt FROM wide
        WHERE otype = 'MaterialSampleRecord'
        GROUP BY n
    `;

    // Pre-computed: read from 2KB summary file
    const summaryQuery = `
        SELECT facet_value, count FROM read_parquet('${summariesUrl}')
        WHERE facet_type = 'source'
    `;

    const runs = 3;

    try {
        const fullScanTimes = [];
        for (let i = 0; i < runs; i++) {
            const start = performance.now();
            await dbWide.query(fullScanQuery);
            fullScanTimes.push(performance.now() - start);
        }

        const summaryTimes = [];
        for (let i = 0; i < runs; i++) {
            const start = performance.now();
            await dbWide.query(summaryQuery);
            summaryTimes.push(performance.now() - start);
        }

        const median = arr => {
            const sorted = [...arr].sort((a, b) => a - b);
            if (sorted.length === 2) return (sorted[0] + sorted[1]) / 2;
            return sorted[Math.floor(sorted.length / 2)];
        };
        const warmMedian = arr => {
            if (arr.length <= 1) return arr[0] || 0;
            return median(arr.slice(1));
        };

        const fullScanMedian = warmMedian(fullScanTimes);
        const summaryMedian = warmMedian(summaryTimes);

        return {
            name: "Facet Aggregation (full scan vs summary)",
            fullScanCold: fullScanTimes[0],
            fullScanMedian: fullScanMedian,
            fullScanAll: fullScanTimes,
            summaryCold: summaryTimes[0],
            summaryMedian: summaryMedian,
            summaryAll: summaryTimes,
            speedup: fullScanMedian / summaryMedian
        };
    } catch (e) {
        if (errorDiv) {
            errorDiv.textContent = `Benchmark 5 failed: ${e.message}`;
            errorDiv.style.display = 'block';
        }
        return { error: e.message };
    } finally {
        if (loadingDiv) loadingDiv.hidden = true;
    }
}

10 Results Summary

11 Technical Notes

11.1 Pitfalls and Considerations

Consideration How We Address It
Browser caching First run is “cold” (metadata not cached), subsequent runs are “warm”
Network variability We run 3 iterations and report the warm-run median (exclude cold run)
JIT compilation First run includes JIT overhead; warm runs are more representative
Memory limits 691 MB narrow file may stress browser memory; wide format is safer

11.2 Schema Differences

Narrow schema stores relationships as edge rows:

-- Edge row example
{otype: '_edge_', s: 123, p: 'produced_by', o: [456]}

Wide schema stores relationships as columns:

-- Entity row with relationship columns
{otype: 'MaterialSampleRecord', p__produced_by: [456], p__has_material_category: [789]}

Wide+H3 schema adds pre-computed spatial indices:

-- Same as wide, plus H3 hexagonal index columns
{..., h3_res4: 595536348953485311, h3_res6: 604265133842685951, h3_res8: 613003918731886591}

The wide format eliminates ~9M edge rows (60% file size reduction). The H3 variant adds ~14MB for spatial index columns that enable cell-based geospatial filtering.

Facet summaries (2KB) pre-compute common aggregations:

{facet_type: 'source', facet_value: 'SESAR', scheme: null, count: 4600000}

12 See Also