Narrow vs Wide Schema Performance Comparison
parquet
performance
benchmarking
This page benchmarks the performance difference between narrow and wide parquet schema formats when accessing data “over the wire” via HTTP range requests in DuckDB-WASM.
1 Introduction
1.1 What are Narrow vs Wide Schemas?
The iSamples property graph data can be serialized in two different parquet formats:
| Format | Description | File Size | Row Count |
|---|---|---|---|
| Narrow | Stores relationships as separate edge rows (otype='_edge_') |
691 MB | ~11.6M rows |
| Wide | Stores relationships as p__* columns on entity rows |
275 MB | ~2.5M rows |
Both formats represent the same underlying data with identical semantics, but the wide format is optimized for analytical queries by eliminating edge rows.
1.2 Why Performance Matters
When using DuckDB-WASM in the browser:
- Data is fetched via HTTP range requests (206 Partial Content)
- Only the columns and row groups needed for a query are downloaded
- Smaller files with fewer rows = fewer bytes to transfer, faster queries
Expected speedup: Wide format should be 2-3x faster based on local benchmarks.
2 Methodology
NoteBenchmarking Approach
- Cold run: First query (includes metadata fetch, JIT compilation) - reported separately
- Warm runs: Runs 2-3 (metadata cached, JIT warmed up)
- Warm median: Median of warm runs only (excludes cold run for fair comparison)
- Sequential execution: Benchmarks run one after another, not concurrently
- Network variability: Results will vary based on your network connection and hardware
Results are shown in real-time as benchmarks complete. Data loading only begins when you click the button.
3 Setup
3.1 Environment Info
3.2 Run Benchmarks
4 Data Validity Check
Before benchmarking, let’s confirm both schemas represent the same underlying data.
Checking data validity…
Code
validityCheck = {
// Only run when button clicked AND databases are initialized
if (runBenchmarks < 1 || !dbNarrow || !dbWide) return null;
const loadingDiv = document.getElementById('loading_validity');
const errorDiv = document.getElementById('error_display');
if (loadingDiv) loadingDiv.hidden = false;
try {
// Count rows in narrow
const narrowCount = await dbNarrow.query(`SELECT COUNT(*) as cnt FROM narrow`);
const narrowTotal = narrowCount[0].cnt;
// Count rows in wide
const wideCount = await dbWide.query(`SELECT COUNT(*) as cnt FROM wide`);
const wideTotal = wideCount[0].cnt;
// Count entity types in narrow (excluding edges)
const narrowEntities = await dbNarrow.query(`
SELECT COUNT(*) as cnt FROM narrow
WHERE otype != '_edge_'
`);
const narrowEntityCount = narrowEntities[0].cnt;
// Count samples in both
const narrowSamples = await dbNarrow.query(`
SELECT COUNT(*) as cnt FROM narrow
WHERE otype = 'MaterialSampleRecord'
`);
const wideSamples = await dbWide.query(`
SELECT COUNT(*) as cnt FROM wide
WHERE otype = 'MaterialSampleRecord'
`);
return {
narrowTotal: narrowTotal,
wideTotal: wideTotal,
narrowEntities: narrowEntityCount,
narrowSamples: narrowSamples[0].cnt,
wideSamples: wideSamples[0].cnt,
sampleMatch: narrowSamples[0].cnt === wideSamples[0].cnt
};
} catch (e) {
if (errorDiv) {
errorDiv.textContent = `Validity check failed: ${e.message}`;
errorDiv.style.display = 'block';
}
return { error: e.message };
} finally {
if (loadingDiv) loadingDiv.hidden = true;
}
}5 Benchmark 1: Entity Count Query
This benchmark tests a simple COUNT(*) GROUP BY otype query, which requires scanning row metadata.
Running Benchmark 1…
Code
benchmark1 = {
// Wait for validity check to complete first (sequential execution)
if (runBenchmarks < 1 || !validityCheck || validityCheck.error) return null;
const loadingDiv = document.getElementById('loading_b1');
const errorDiv = document.getElementById('error_display');
if (loadingDiv) loadingDiv.hidden = false;
const query = `SELECT otype, COUNT(*) as cnt FROM {table} GROUP BY otype ORDER BY cnt DESC`;
const runs = 3;
try {
// Narrow benchmark
const narrowTimes = [];
for (let i = 0; i < runs; i++) {
const start = performance.now();
await dbNarrow.query(query.replace('{table}', 'narrow'));
narrowTimes.push(performance.now() - start);
}
// Wide benchmark
const wideTimes = [];
for (let i = 0; i < runs; i++) {
const start = performance.now();
await dbWide.query(query.replace('{table}', 'wide'));
wideTimes.push(performance.now() - start);
}
// Calculate medians (excluding cold run for warm median)
const median = arr => {
const sorted = [...arr].sort((a, b) => a - b);
if (sorted.length === 2) return (sorted[0] + sorted[1]) / 2;
return sorted[Math.floor(sorted.length / 2)];
};
const warmMedian = arr => {
if (arr.length <= 1) return arr[0] || 0;
const warm = arr.slice(1); // exclude first (cold) run
return median(warm);
};
const narrowMedian = warmMedian(narrowTimes);
const wideMedian = warmMedian(wideTimes);
return {
name: "Entity Count (GROUP BY otype)",
narrowCold: narrowTimes[0],
narrowMedian: narrowMedian,
narrowAll: narrowTimes,
wideCold: wideTimes[0],
wideMedian: wideMedian,
wideAll: wideTimes,
speedup: narrowMedian / wideMedian
};
} catch (e) {
if (errorDiv) {
errorDiv.textContent = `Benchmark 1 failed: ${e.message}`;
errorDiv.style.display = 'block';
}
return { error: e.message };
} finally {
if (loadingDiv) loadingDiv.hidden = true;
}
}6 Benchmark 2: Sample Count by Site
This benchmark counts samples per sampling site, requiring a join between samples and sites.
Query complexity: - Narrow: Requires joining through edge rows - Wide: Direct join via p__* columns
Running Benchmark 2…
Code
benchmark2 = {
// Wait for benchmark1 to complete first (sequential execution)
if (runBenchmarks < 1 || !benchmark1 || benchmark1.error) return null;
const loadingDiv = document.getElementById('loading_b2');
const errorDiv = document.getElementById('error_display');
if (loadingDiv) loadingDiv.hidden = false;
// Narrow query: traverse edges to get from sample -> event -> site
const narrowQuery = `
WITH sample_events AS (
SELECT
e.o[1] as event_id,
s.row_id as sample_id
FROM narrow s
JOIN narrow e ON s.row_id = e.s AND e.p = 'produced_by'
WHERE s.otype = 'MaterialSampleRecord'
),
event_sites AS (
SELECT
se.sample_id,
e2.o[1] as site_id
FROM sample_events se
JOIN narrow e2 ON se.event_id = e2.s AND e2.p = 'sampling_site'
)
SELECT
site.label,
COUNT(*) as sample_count
FROM event_sites es
JOIN narrow site ON es.site_id = site.row_id
GROUP BY site.label
ORDER BY sample_count DESC
LIMIT 10
`;
// Wide query: direct column access
const wideQuery = `
WITH sample_sites AS (
SELECT
s.row_id as sample_id,
e.p__sampling_site[1] as site_id
FROM wide s
JOIN wide e ON s.p__produced_by[1] = e.row_id
WHERE s.otype = 'MaterialSampleRecord'
AND e.otype = 'SamplingEvent'
)
SELECT
site.label,
COUNT(*) as sample_count
FROM sample_sites ss
JOIN wide site ON ss.site_id = site.row_id
WHERE site.otype = 'SamplingSite'
GROUP BY site.label
ORDER BY sample_count DESC
LIMIT 10
`;
const runs = 3;
try {
// Narrow benchmark
const narrowTimes = [];
for (let i = 0; i < runs; i++) {
const start = performance.now();
await dbNarrow.query(narrowQuery);
narrowTimes.push(performance.now() - start);
}
// Wide benchmark
const wideTimes = [];
for (let i = 0; i < runs; i++) {
const start = performance.now();
await dbWide.query(wideQuery);
wideTimes.push(performance.now() - start);
}
const median = arr => {
const sorted = [...arr].sort((a, b) => a - b);
if (sorted.length === 2) return (sorted[0] + sorted[1]) / 2;
return sorted[Math.floor(sorted.length / 2)];
};
const warmMedian = arr => {
if (arr.length <= 1) return arr[0] || 0;
const warm = arr.slice(1);
return median(warm);
};
const narrowMedian = warmMedian(narrowTimes);
const wideMedian = warmMedian(wideTimes);
return {
name: "Sample Count by Site (multi-join)",
narrowCold: narrowTimes[0],
narrowMedian: narrowMedian,
narrowAll: narrowTimes,
wideCold: wideTimes[0],
wideMedian: wideMedian,
wideAll: wideTimes,
speedup: narrowMedian / wideMedian
};
} catch (e) {
if (errorDiv) {
errorDiv.textContent = `Benchmark 2 failed: ${e.message}`;
errorDiv.style.display = 'block';
}
return { error: e.message };
} finally {
if (loadingDiv) loadingDiv.hidden = true;
}
}7 Benchmark 3: Material Type Distribution
This benchmark aggregates sample counts by material category.
Running Benchmark 3…
Code
benchmark3 = {
// Wait for benchmark2 to complete first (sequential execution)
if (runBenchmarks < 1 || !benchmark2 || benchmark2.error) return null;
const loadingDiv = document.getElementById('loading_b3');
const errorDiv = document.getElementById('error_display');
if (loadingDiv) loadingDiv.hidden = false;
// Narrow query: join through edges to material concepts
const narrowQuery = `
SELECT
c.label as material,
COUNT(*) as sample_count
FROM narrow s
JOIN narrow e ON s.row_id = e.s AND e.p = 'has_material_category'
JOIN narrow c ON e.o[1] = c.row_id
WHERE s.otype = 'MaterialSampleRecord'
GROUP BY c.label
ORDER BY sample_count DESC
LIMIT 10
`;
// Wide query: direct column access to material category
const wideQuery = `
SELECT
c.label as material,
COUNT(*) as sample_count
FROM wide s
JOIN wide c ON s.p__has_material_category[1] = c.row_id
WHERE s.otype = 'MaterialSampleRecord'
AND c.otype = 'IdentifiedConcept'
GROUP BY c.label
ORDER BY sample_count DESC
LIMIT 10
`;
const runs = 3;
try {
// Narrow benchmark
const narrowTimes = [];
for (let i = 0; i < runs; i++) {
const start = performance.now();
await dbNarrow.query(narrowQuery);
narrowTimes.push(performance.now() - start);
}
// Wide benchmark
const wideTimes = [];
for (let i = 0; i < runs; i++) {
const start = performance.now();
await dbWide.query(wideQuery);
wideTimes.push(performance.now() - start);
}
const median = arr => {
const sorted = [...arr].sort((a, b) => a - b);
if (sorted.length === 2) return (sorted[0] + sorted[1]) / 2;
return sorted[Math.floor(sorted.length / 2)];
};
const warmMedian = arr => {
if (arr.length <= 1) return arr[0] || 0;
const warm = arr.slice(1);
return median(warm);
};
const narrowMedian = warmMedian(narrowTimes);
const wideMedian = warmMedian(wideTimes);
return {
name: "Material Type Distribution",
narrowCold: narrowTimes[0],
narrowMedian: narrowMedian,
narrowAll: narrowTimes,
wideCold: wideTimes[0],
wideMedian: wideMedian,
wideAll: wideTimes,
speedup: narrowMedian / wideMedian
};
} catch (e) {
if (errorDiv) {
errorDiv.textContent = `Benchmark 3 failed: ${e.message}`;
errorDiv.style.display = 'block';
}
return { error: e.message };
} finally {
if (loadingDiv) loadingDiv.hidden = true;
}
}8 Results Summary
9 Technical Notes
9.1 Pitfalls and Considerations
| Consideration | How We Address It |
|---|---|
| Browser caching | First run is “cold” (metadata not cached), subsequent runs are “warm” |
| Network variability | We run 3 iterations and report the warm-run median (exclude cold run) |
| JIT compilation | First run includes JIT overhead; warm runs are more representative |
| Memory limits | 691 MB narrow file may stress browser memory; wide format is safer |
9.2 Schema Differences
Narrow schema stores relationships as edge rows:
-- Edge row example
{otype: '_edge_', s: 123, p: 'produced_by', o: [456]}Wide schema stores relationships as columns:
-- Entity row with relationship columns
{otype: 'MaterialSampleRecord', p__produced_by: [456], p__has_material_category: [789]}This eliminates ~9M edge rows, resulting in the 60% file size reduction.
10 See Also
- OpenContext Parquet Analysis - Deep dive into the property graph structure
- Cesium Visualization - Interactive 3D visualization of sample locations