Narrow vs Wide Schema Performance Comparison

parquet
performance
benchmarking

This page benchmarks the performance difference between narrow and wide parquet schema formats when accessing data “over the wire” via HTTP range requests in DuckDB-WASM.

1 Introduction

1.1 What are Narrow vs Wide Schemas?

The iSamples property graph data can be serialized in two different parquet formats:

Format Description File Size Row Count
Narrow Stores relationships as separate edge rows (otype='_edge_') 691 MB ~11.6M rows
Wide Stores relationships as p__* columns on entity rows 275 MB ~2.5M rows

Both formats represent the same underlying data with identical semantics, but the wide format is optimized for analytical queries by eliminating edge rows.

1.2 Why Performance Matters

When using DuckDB-WASM in the browser:

  • Data is fetched via HTTP range requests (206 Partial Content)
  • Only the columns and row groups needed for a query are downloaded
  • Smaller files with fewer rows = fewer bytes to transfer, faster queries

Expected speedup: Wide format should be 2-3x faster based on local benchmarks.

2 Methodology

NoteBenchmarking Approach
  • Cold run: First query (includes metadata fetch, JIT compilation) - reported separately
  • Warm runs: Runs 2-3 (metadata cached, JIT warmed up)
  • Warm median: Median of warm runs only (excludes cold run for fair comparison)
  • Sequential execution: Benchmarks run one after another, not concurrently
  • Network variability: Results will vary based on your network connection and hardware

Results are shown in real-time as benchmarks complete. Data loading only begins when you click the button.

3 Setup

3.1 Environment Info

3.2 Run Benchmarks

4 Data Validity Check

Before benchmarking, let’s confirm both schemas represent the same underlying data.

Code
validityCheck = {
    // Only run when button clicked AND databases are initialized
    if (runBenchmarks < 1 || !dbNarrow || !dbWide) return null;

    const loadingDiv = document.getElementById('loading_validity');
    const errorDiv = document.getElementById('error_display');
    if (loadingDiv) loadingDiv.hidden = false;

    try {
        // Count rows in narrow
        const narrowCount = await dbNarrow.query(`SELECT COUNT(*) as cnt FROM narrow`);
        const narrowTotal = narrowCount[0].cnt;

        // Count rows in wide
        const wideCount = await dbWide.query(`SELECT COUNT(*) as cnt FROM wide`);
        const wideTotal = wideCount[0].cnt;

        // Count entity types in narrow (excluding edges)
        const narrowEntities = await dbNarrow.query(`
            SELECT COUNT(*) as cnt FROM narrow
            WHERE otype != '_edge_'
        `);
        const narrowEntityCount = narrowEntities[0].cnt;

        // Count samples in both
        const narrowSamples = await dbNarrow.query(`
            SELECT COUNT(*) as cnt FROM narrow
            WHERE otype = 'MaterialSampleRecord'
        `);

        const wideSamples = await dbWide.query(`
            SELECT COUNT(*) as cnt FROM wide
            WHERE otype = 'MaterialSampleRecord'
        `);

        return {
            narrowTotal: narrowTotal,
            wideTotal: wideTotal,
            narrowEntities: narrowEntityCount,
            narrowSamples: narrowSamples[0].cnt,
            wideSamples: wideSamples[0].cnt,
            sampleMatch: narrowSamples[0].cnt === wideSamples[0].cnt
        };
    } catch (e) {
        if (errorDiv) {
            errorDiv.textContent = `Validity check failed: ${e.message}`;
            errorDiv.style.display = 'block';
        }
        return { error: e.message };
    } finally {
        if (loadingDiv) loadingDiv.hidden = true;
    }
}

5 Benchmark 1: Entity Count Query

This benchmark tests a simple COUNT(*) GROUP BY otype query, which requires scanning row metadata.

Code
benchmark1 = {
    // Wait for validity check to complete first (sequential execution)
    if (runBenchmarks < 1 || !validityCheck || validityCheck.error) return null;

    const loadingDiv = document.getElementById('loading_b1');
    const errorDiv = document.getElementById('error_display');
    if (loadingDiv) loadingDiv.hidden = false;

    const query = `SELECT otype, COUNT(*) as cnt FROM {table} GROUP BY otype ORDER BY cnt DESC`;
    const runs = 3;

    try {
        // Narrow benchmark
        const narrowTimes = [];
        for (let i = 0; i < runs; i++) {
            const start = performance.now();
            await dbNarrow.query(query.replace('{table}', 'narrow'));
            narrowTimes.push(performance.now() - start);
        }

        // Wide benchmark
        const wideTimes = [];
        for (let i = 0; i < runs; i++) {
            const start = performance.now();
            await dbWide.query(query.replace('{table}', 'wide'));
            wideTimes.push(performance.now() - start);
        }

        // Calculate medians (excluding cold run for warm median)
        const median = arr => {
            const sorted = [...arr].sort((a, b) => a - b);
            if (sorted.length === 2) return (sorted[0] + sorted[1]) / 2;
            return sorted[Math.floor(sorted.length / 2)];
        };
        const warmMedian = arr => {
            if (arr.length <= 1) return arr[0] || 0;
            const warm = arr.slice(1); // exclude first (cold) run
            return median(warm);
        };

        const narrowMedian = warmMedian(narrowTimes);
        const wideMedian = warmMedian(wideTimes);

        return {
            name: "Entity Count (GROUP BY otype)",
            narrowCold: narrowTimes[0],
            narrowMedian: narrowMedian,
            narrowAll: narrowTimes,
            wideCold: wideTimes[0],
            wideMedian: wideMedian,
            wideAll: wideTimes,
            speedup: narrowMedian / wideMedian
        };
    } catch (e) {
        if (errorDiv) {
            errorDiv.textContent = `Benchmark 1 failed: ${e.message}`;
            errorDiv.style.display = 'block';
        }
        return { error: e.message };
    } finally {
        if (loadingDiv) loadingDiv.hidden = true;
    }
}

6 Benchmark 2: Sample Count by Site

This benchmark counts samples per sampling site, requiring a join between samples and sites.

Query complexity: - Narrow: Requires joining through edge rows - Wide: Direct join via p__* columns

Code
benchmark2 = {
    // Wait for benchmark1 to complete first (sequential execution)
    if (runBenchmarks < 1 || !benchmark1 || benchmark1.error) return null;

    const loadingDiv = document.getElementById('loading_b2');
    const errorDiv = document.getElementById('error_display');
    if (loadingDiv) loadingDiv.hidden = false;

    // Narrow query: traverse edges to get from sample -> event -> site
    const narrowQuery = `
        WITH sample_events AS (
            SELECT
                e.o[1] as event_id,
                s.row_id as sample_id
            FROM narrow s
            JOIN narrow e ON s.row_id = e.s AND e.p = 'produced_by'
            WHERE s.otype = 'MaterialSampleRecord'
        ),
        event_sites AS (
            SELECT
                se.sample_id,
                e2.o[1] as site_id
            FROM sample_events se
            JOIN narrow e2 ON se.event_id = e2.s AND e2.p = 'sampling_site'
        )
        SELECT
            site.label,
            COUNT(*) as sample_count
        FROM event_sites es
        JOIN narrow site ON es.site_id = site.row_id
        GROUP BY site.label
        ORDER BY sample_count DESC
        LIMIT 10
    `;

    // Wide query: direct column access
    const wideQuery = `
        WITH sample_sites AS (
            SELECT
                s.row_id as sample_id,
                e.p__sampling_site[1] as site_id
            FROM wide s
            JOIN wide e ON s.p__produced_by[1] = e.row_id
            WHERE s.otype = 'MaterialSampleRecord'
              AND e.otype = 'SamplingEvent'
        )
        SELECT
            site.label,
            COUNT(*) as sample_count
        FROM sample_sites ss
        JOIN wide site ON ss.site_id = site.row_id
        WHERE site.otype = 'SamplingSite'
        GROUP BY site.label
        ORDER BY sample_count DESC
        LIMIT 10
    `;

    const runs = 3;

    try {
        // Narrow benchmark
        const narrowTimes = [];
        for (let i = 0; i < runs; i++) {
            const start = performance.now();
            await dbNarrow.query(narrowQuery);
            narrowTimes.push(performance.now() - start);
        }

        // Wide benchmark
        const wideTimes = [];
        for (let i = 0; i < runs; i++) {
            const start = performance.now();
            await dbWide.query(wideQuery);
            wideTimes.push(performance.now() - start);
        }

        const median = arr => {
            const sorted = [...arr].sort((a, b) => a - b);
            if (sorted.length === 2) return (sorted[0] + sorted[1]) / 2;
            return sorted[Math.floor(sorted.length / 2)];
        };
        const warmMedian = arr => {
            if (arr.length <= 1) return arr[0] || 0;
            const warm = arr.slice(1);
            return median(warm);
        };

        const narrowMedian = warmMedian(narrowTimes);
        const wideMedian = warmMedian(wideTimes);

        return {
            name: "Sample Count by Site (multi-join)",
            narrowCold: narrowTimes[0],
            narrowMedian: narrowMedian,
            narrowAll: narrowTimes,
            wideCold: wideTimes[0],
            wideMedian: wideMedian,
            wideAll: wideTimes,
            speedup: narrowMedian / wideMedian
        };
    } catch (e) {
        if (errorDiv) {
            errorDiv.textContent = `Benchmark 2 failed: ${e.message}`;
            errorDiv.style.display = 'block';
        }
        return { error: e.message };
    } finally {
        if (loadingDiv) loadingDiv.hidden = true;
    }
}

7 Benchmark 3: Material Type Distribution

This benchmark aggregates sample counts by material category.

Code
benchmark3 = {
    // Wait for benchmark2 to complete first (sequential execution)
    if (runBenchmarks < 1 || !benchmark2 || benchmark2.error) return null;

    const loadingDiv = document.getElementById('loading_b3');
    const errorDiv = document.getElementById('error_display');
    if (loadingDiv) loadingDiv.hidden = false;

    // Narrow query: join through edges to material concepts
    const narrowQuery = `
        SELECT
            c.label as material,
            COUNT(*) as sample_count
        FROM narrow s
        JOIN narrow e ON s.row_id = e.s AND e.p = 'has_material_category'
        JOIN narrow c ON e.o[1] = c.row_id
        WHERE s.otype = 'MaterialSampleRecord'
        GROUP BY c.label
        ORDER BY sample_count DESC
        LIMIT 10
    `;

    // Wide query: direct column access to material category
    const wideQuery = `
        SELECT
            c.label as material,
            COUNT(*) as sample_count
        FROM wide s
        JOIN wide c ON s.p__has_material_category[1] = c.row_id
        WHERE s.otype = 'MaterialSampleRecord'
          AND c.otype = 'IdentifiedConcept'
        GROUP BY c.label
        ORDER BY sample_count DESC
        LIMIT 10
    `;

    const runs = 3;

    try {
        // Narrow benchmark
        const narrowTimes = [];
        for (let i = 0; i < runs; i++) {
            const start = performance.now();
            await dbNarrow.query(narrowQuery);
            narrowTimes.push(performance.now() - start);
        }

        // Wide benchmark
        const wideTimes = [];
        for (let i = 0; i < runs; i++) {
            const start = performance.now();
            await dbWide.query(wideQuery);
            wideTimes.push(performance.now() - start);
        }

        const median = arr => {
            const sorted = [...arr].sort((a, b) => a - b);
            if (sorted.length === 2) return (sorted[0] + sorted[1]) / 2;
            return sorted[Math.floor(sorted.length / 2)];
        };
        const warmMedian = arr => {
            if (arr.length <= 1) return arr[0] || 0;
            const warm = arr.slice(1);
            return median(warm);
        };

        const narrowMedian = warmMedian(narrowTimes);
        const wideMedian = warmMedian(wideTimes);

        return {
            name: "Material Type Distribution",
            narrowCold: narrowTimes[0],
            narrowMedian: narrowMedian,
            narrowAll: narrowTimes,
            wideCold: wideTimes[0],
            wideMedian: wideMedian,
            wideAll: wideTimes,
            speedup: narrowMedian / wideMedian
        };
    } catch (e) {
        if (errorDiv) {
            errorDiv.textContent = `Benchmark 3 failed: ${e.message}`;
            errorDiv.style.display = 'block';
        }
        return { error: e.message };
    } finally {
        if (loadingDiv) loadingDiv.hidden = true;
    }
}

8 Results Summary

9 Technical Notes

9.1 Pitfalls and Considerations

Consideration How We Address It
Browser caching First run is “cold” (metadata not cached), subsequent runs are “warm”
Network variability We run 3 iterations and report the warm-run median (exclude cold run)
JIT compilation First run includes JIT overhead; warm runs are more representative
Memory limits 691 MB narrow file may stress browser memory; wide format is safer

9.2 Schema Differences

Narrow schema stores relationships as edge rows:

-- Edge row example
{otype: '_edge_', s: 123, p: 'produced_by', o: [456]}

Wide schema stores relationships as columns:

-- Entity row with relationship columns
{otype: 'MaterialSampleRecord', p__produced_by: [456], p__has_material_category: [789]}

This eliminates ~9M edge rows, resulting in the 60% file size reduction.

10 See Also