OpenContext Parquet Data Analysis - Enhanced Edition

parquet

spatial

property-graph

This document provides an enhanced analysis of the OpenContext iSamples parquet file, demonstrating the property graph structure and how to work with archaeological specimen data.

1 Understanding the Property Graph Structure

The OpenContext iSamples parquet file implements a sophisticated property graph model that combines the flexibility of graph databases with the analytical performance of columnar storage. Unlike traditional relational databases or pure graph databases, this approach stores both entities (nodes) and relationships (edges) in a single table structure.

1.1 Why a Property Graph?

Archaeological and specimen data inherently forms a network:

Samples are collected at sites during events
Sites have geographic locations
Samples have material types from controlled vocabularies
People (agents) have various roles in the collection process

This interconnected nature makes a graph model ideal for representing the complex relationships while maintaining query performance.

2 Setup

Code

viewof parquet_path = Inputs.text({
    label: "Parquet File URL",
    value: "https://storage.googleapis.com/opencontext-parquet/oc_isamples_pqg.parquet",
    width: "100%",
    submit: true
});

Code

db = {
    const instance = await DuckDBClient.of();
    await instance.query(`CREATE VIEW nodes AS SELECT * FROM read_parquet('${parquet_path}')`);
    return instance;
}

// Helper function for loading data with visual feedback
async function loadData(query, params=[], waiting_id=null) {
    const waiter = document.getElementById(waiting_id);
    if (waiter) {
        waiter.hidden = false;
    }
    try {
        const _results = await db.query(query, ...params);
        return _results;
    } catch (error) {
        if (waiter) {
            waiter.innerHTML = `<pre>${error}</pre>`;
        }
        return null;
    } finally {
        if (waiter) {
            waiter.hidden = true;
        }
    }
}

3 Data Model Deep Dive

3.1 Entity Types in the Dataset

The parquet file contains 7 distinct object types (otype), each serving a specific purpose in the archaeological data model:

Code

entityTypeDescriptions = {
    return [
        {otype: "_edge_", purpose: "Relationships between entities", icon: "🔗"},
        {otype: "MaterialSampleRecord", purpose: "Physical samples/specimens", icon: "🪨"},
        {otype: "SamplingEvent", purpose: "When/how samples were collected", icon: "📅"},
        {otype: "GeospatialCoordLocation", purpose: "Geographic coordinates", icon: "📍"},
        {otype: "SamplingSite", purpose: "Archaeological sites/dig locations", icon: "🏛️"},
        {otype: "IdentifiedConcept", purpose: "Controlled vocabulary terms", icon: "📚"},
        {otype: "Agent", purpose: "People and organizations", icon: "👤"}
    ];
}

viewof entityTypeTable = Inputs.table(entityTypeDescriptions, {
    header: {
        otype: "Entity Type",
        purpose: "Purpose",
        icon: "Icon"
    }
})

3.2 Entity Distribution

Code

entityStats = {
    const query = `
        SELECT
            otype,
            COUNT(*) as count,
            ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 2) as percentage
        FROM nodes
        GROUP BY otype
        ORDER BY count DESC
    `;
    const data = await loadData(query, [], "loading_entity_stats");
    return data;
}

Code

viewof entityStatsTable = Inputs.table(entityStats, {
    header: {
        otype: "Entity Type",
        count: "Count",
        percentage: "Percentage"
    },
    format: {
        count: d => d.toLocaleString(),
        percentage: d => d + "%"
    }
})

Total records:

3.3 How Entities Connect: The Edge Model

Edges use a triple structure inspired by RDF:

Subject (s): The source entity’s row_id
Predicate (p): The relationship type
Object (o): Array of target entity row_ids

This allows representing both simple (1:1) and complex (1:many) relationships efficiently.

Code

// Visualize common relationship patterns
relationshipPatterns = {
    const query = `
        SELECT
            p as relationship,
            COUNT(*) as usage_count,
            COUNT(DISTINCT s) as unique_subjects
        FROM nodes
        WHERE otype = '_edge_'
          AND p IS NOT NULL
        GROUP BY p
        ORDER BY usage_count DESC
        LIMIT 15
    `;
    const data = await loadData(query, [], "loading_relationships");
    return data;
}

Code

viewof relationshipTable = Inputs.table(relationshipPatterns, {
    header: {
        relationship: "Relationship Type",
        usage_count: "Total Uses",
        unique_subjects: "Unique Subjects"
    },
    format: {
        usage_count: d => d.toLocaleString(),
        unique_subjects: d => d.toLocaleString()
    }
})

4 🚨 Critical Discovery: Correct Relationship Paths

Before you query this data, understand this key insight:

❌ Common Mistake: Assuming direct Sample → Location relationships ✅ Reality: All location queries require multi-hop traversal through SamplingEvent

4.1 The Correct Traversals Discovered (Using Typed Edges)

Direct Location (via EVENT_SAMPLE_LOCATION)

MaterialSampleRecord → MSR_PRODUCED_BY → SamplingEvent → EVENT_SAMPLE_LOCATION → GeospatialCoordLocation

Site-Mediated Location (via EVENT_SAMPLING_SITE → SITE_LOCATION)

MaterialSampleRecord → MSR_PRODUCED_BY → SamplingEvent → EVENT_SAMPLING_SITE → SamplingSite → SITE_LOCATION → GeospatialCoordLocation

This discovery unlocked 1,096,274 samples that were previously inaccessible due to incorrect query patterns!

5 Working with the Graph: Query Patterns

5.1 Finding Samples with Locations (CORRECTED)

The most common need is connecting samples to their geographic coordinates. This requires traversing the graph through edges:

Code

// Example: Get samples with direct location assignments (CORRECTED)
// Path: Sample -> produced_by -> SamplingEvent -> sample_location -> GeospatialCoordLocation
sampleLocationExample = {
    const query = `
        WITH sample_locations AS (
            SELECT
                s.pid as sample_id,
                s.label as sample_label,
                g.latitude,
                g.longitude,
                'direct_event_location' as location_relationship
            FROM nodes s
            JOIN nodes e1 ON s.row_id = e1.s AND e1.p = 'produced_by'
            JOIN nodes event ON e1.o[1] = event.row_id
            JOIN nodes e2 ON event.row_id = e2.s AND e2.p = 'sample_location'
            JOIN nodes g ON e2.o[1] = g.row_id
            WHERE s.otype = 'MaterialSampleRecord'
              AND event.otype = 'SamplingEvent'
              AND g.otype = 'GeospatialCoordLocation'
              AND g.latitude IS NOT NULL
            LIMIT 5
        )
        SELECT * FROM sample_locations
    `;
    const data = await loadData(query, [], "loading_sample_loc_example");
    return data;
}

Code

viewof sampleLocationTable = Inputs.table(sampleLocationExample, {
    layout: "auto"
})

5.2 ⚠️ Why Previous Queries Failed

Many existing examples tried this incorrect pattern:

-- ❌ BROKEN: This relationship doesn't exist!
FROM MaterialSampleRecord s
JOIN edge e ON s.row_id = e.s AND e.p = 'sample_location'
JOIN GeospatialCoordLocation g ON e.o[1] = g.row_id

Result: 0 samples found

The correct pattern requires going through SamplingEvent:

-- ✅ CORRECT: Multi-hop traversal
FROM MaterialSampleRecord s
JOIN edge e1 ON s.row_id = e1.s AND e1.p = 'produced_by'
JOIN SamplingEvent event ON e1.o[1] = event.row_id
JOIN edge e2 ON event.row_id = e2.s AND e2.p = 'sample_location'
JOIN GeospatialCoordLocation g ON e2.o[1] = g.row_id

Result: 1,096,274 samples found!

5.3 Multi-Hop Traversal: Sample → Event → Site → Location

Many samples don’t have direct coordinates but are linked through their collection event and site:

Code

// Trace the full chain from sample to site location
siteChainExample = {
    const query = `
        SELECT
            samp.pid as sample_id,
            event.pid as event_id,
            site.label as site_name,
            loc.latitude,
            loc.longitude
        FROM nodes samp
        JOIN nodes e1 ON samp.row_id = e1.s AND e1.p = 'produced_by'
        JOIN nodes event ON e1.o[1] = event.row_id
        JOIN nodes e2 ON event.row_id = e2.s AND e2.p = 'sampling_site'
        JOIN nodes site ON e2.o[1] = site.row_id
        JOIN nodes e3 ON site.row_id = e3.s AND e3.p = 'site_location'
        JOIN nodes loc ON e3.o[1] = loc.row_id
        WHERE samp.otype = 'MaterialSampleRecord'
          AND event.otype = 'SamplingEvent'
          AND site.otype = 'SamplingSite'
          AND loc.otype = 'GeospatialCoordLocation'
        LIMIT 5
    `;
    const data = await loadData(query, [], "loading_chain_example");
    return data;
}

Code

viewof siteChainTable = Inputs.table(siteChainExample, {
    layout: "auto",
    width: {
        sample_id: 150,
        event_id: 150,
        site_name: 200
    }
})

6 Site Analysis

6.1 Top Archaeological Sites by Sample Count

Code

topSites = {
    const query = `
        WITH site_samples AS (
            SELECT
                site.label as site_name,
                site.pid as site_id,
                COUNT(DISTINCT samp.row_id) as sample_count
            FROM nodes samp
            JOIN nodes e1 ON samp.row_id = e1.s AND e1.p = 'produced_by'
            JOIN nodes event ON e1.o[1] = event.row_id
            JOIN nodes e2 ON event.row_id = e2.s AND e2.p = 'sampling_site'
            JOIN nodes site ON e2.o[1] = site.row_id
            WHERE samp.otype = 'MaterialSampleRecord'
              AND event.otype = 'SamplingEvent'
              AND site.otype = 'SamplingSite'
            GROUP BY site.label, site.pid
        )
        SELECT * FROM site_samples
        ORDER BY sample_count DESC
        LIMIT 20
    `;
    const data = await loadData(query, [], "loading_top_sites");
    return data;
}

Code

viewof topSitesTable = Inputs.table(topSites, {
    header: {
        site_name: "Site Name",
        site_id: "Site ID",
        sample_count: "Sample Count"
    },
    format: {
        sample_count: d => d.toLocaleString()
    }
})

7 Material Analysis

7.1 Material Type Distribution

Understanding what types of materials are found across the dataset:

Code

materialTypes = {
    const query = `
        SELECT
            mat.label as material_type,
            mat.name as category,
            COUNT(DISTINCT samp.row_id) as sample_count
        FROM nodes samp
        JOIN nodes e ON samp.row_id = e.s AND e.p = 'has_material_category'
        JOIN nodes mat ON e.o[1] = mat.row_id
        WHERE samp.otype = 'MaterialSampleRecord'
          AND e.otype = '_edge_'
          AND mat.otype = 'IdentifiedConcept'
        GROUP BY mat.label, mat.name
        ORDER BY sample_count DESC
        LIMIT 30
    `;
    const data = await loadData(query, [], "loading_materials");
    return data;
}

Code

viewof materialTable = Inputs.table(materialTypes, {
    header: {
        material_type: "Material Type",
        category: "Category",
        sample_count: "Sample Count"
    },
    format: {
        sample_count: d => d.toLocaleString()
    }
})

8 Spatial Distribution

8.1 Geographic Coverage

Code

spatialStats = {
    const query = `
        WITH coord_stats AS (
            SELECT
                MIN(latitude) as min_lat,
                MAX(latitude) as max_lat,
                MIN(longitude) as min_lon,
                MAX(longitude) as max_lon,
                AVG(latitude) as avg_lat,
                AVG(longitude) as avg_lon,
                COUNT(*) as total_locations,
                COUNT(CASE WHEN obfuscated THEN 1 END) as obfuscated_count
            FROM nodes
            WHERE otype = 'GeospatialCoordLocation'
              AND latitude IS NOT NULL
              AND longitude IS NOT NULL
        )
        SELECT * FROM coord_stats
    `;
    const data = await loadData(query, [], "loading_spatial");
    return data;
}

Code

viewof spatialDisplay = {
    const stats = spatialStats[0];
    return html`<div style="padding: 1rem; background: #f0f9ff; border-radius: 8px;">
        <h4 style="margin-top: 0;">Geographic Coverage</h4>
        <p>Total locations: <strong>${stats.total_locations.toLocaleString()}</strong></p>
        <p>Obfuscated locations: <strong>${stats.obfuscated_count.toLocaleString()}</strong>
           (${(stats.obfuscated_count / stats.total_locations * 100).toFixed(1)}%)</p>
        <p>Latitude range: <strong>${stats.min_lat.toFixed(2)}° to ${stats.max_lat.toFixed(2)}°</strong></p>
        <p>Longitude range: <strong>${stats.min_lon.toFixed(2)}° to ${stats.max_lon.toFixed(2)}°</strong></p>
        <p>Center point: <strong>${stats.avg_lat.toFixed(2)}°, ${stats.avg_lon.toFixed(2)}°</strong></p>
    </div>`;
}

8.2 Handling Sensitive Location Data

Archaeological sites often require location protection:

Code

obfuscationStats = {
    const query = `
        SELECT
            obfuscated,
            COUNT(*) as location_count,
            AVG(CASE WHEN latitude IS NOT NULL THEN 1 ELSE 0 END) * 100 as pct_with_coords
        FROM nodes
        WHERE otype = 'GeospatialCoordLocation'
        GROUP BY obfuscated
    `;
    const data = await loadData(query, [], "loading_obfusc_stats");
    return data;
}

Code

viewof obfuscationTable = Inputs.table(obfuscationStats, {
    header: {
        obfuscated: "Location Protection",
        location_count: "Count",
        pct_with_coords: "% With Coordinates"
    },
    format: {
        obfuscated: d => d ? "🔒 Protected" : "📍 Precise",
        location_count: d => d.toLocaleString(),
        pct_with_coords: d => d.toFixed(1) + "%"
    }
})

Data Usage Note

When visualizing archaeological data, always respect location sensitivity flags. Obfuscated coordinates are intentionally imprecise to protect archaeological sites from looting.

9 🔍 Debugging Methodology: How We Found the Correct Paths

9.1 Step 1: Verify Relationship Existence

Code

// Debug: What relationships actually exist FROM MaterialSampleRecord?
debugRelationships = {
    const query = `
        SELECT DISTINCT e.p as predicate, COUNT(*) as count
        FROM nodes s
        JOIN nodes e ON s.row_id = e.s
        WHERE s.otype = 'MaterialSampleRecord'
          AND e.otype = '_edge_'
        GROUP BY e.p
        ORDER BY count DESC
    `;
    const data = await loadData(query, [], "loading_debug_rels");
    return data;
}

Code

viewof debugTable = Inputs.table(debugRelationships, {
    header: {
        predicate: "Relationship Type",
        count: "Usage Count"
    },
    format: {
        count: d => d.toLocaleString()
    }
})

Notice: No direct sample_location relationship! This confirms why direct queries failed.

9.2 Step 2: Trace the Path Through SamplingEvent

Code

// Debug: What relationships exist FROM SamplingEvent?
debugEventRelationships = {
    const query = `
        SELECT DISTINCT e.p as predicate, COUNT(*) as count
        FROM nodes s
        JOIN nodes e ON s.row_id = e.s
        WHERE s.otype = 'SamplingEvent'
          AND e.otype = '_edge_'
        GROUP BY e.p
        ORDER BY count DESC
    `;
    const data = await loadData(query, [], "loading_debug_events");
    return data;
}

Code

viewof debugEventTable = Inputs.table(debugEventRelationships, {
    header: {
        predicate: "Event Relationship",
        count: "Count"
    },
    format: {
        count: d => d.toLocaleString()
    }
})

Key Discovery: SamplingEvent has both sample_location AND sampling_site relationships!

9.3 Step 3: Validate the Complete Chain

Code

// Test: How many samples can we locate using the corrected path?
locationValidation = {
    const query = `
        WITH validation_stats AS (
            -- Direct path count
            SELECT 'Direct Event Location' as path_type, COUNT(*) as sample_count
            FROM nodes s
            JOIN nodes e1 ON s.row_id = e1.s AND e1.p = 'produced_by'
            JOIN nodes event ON e1.o[1] = event.row_id
            JOIN nodes e2 ON event.row_id = e2.s AND e2.p = 'sample_location'
            JOIN nodes g ON e2.o[1] = g.row_id
            WHERE s.otype = 'MaterialSampleRecord'
              AND event.otype = 'SamplingEvent'
              AND g.otype = 'GeospatialCoordLocation'
              AND g.latitude IS NOT NULL

            UNION ALL

            -- Site path count
            SELECT 'Via Site Location' as path_type, COUNT(*) as sample_count
            FROM nodes s
            JOIN nodes e1 ON s.row_id = e1.s AND e1.p = 'produced_by'
            JOIN nodes event ON e1.o[1] = event.row_id
            JOIN nodes e2 ON event.row_id = e2.s AND e2.p = 'sampling_site'
            JOIN nodes site ON e2.o[1] = site.row_id
            JOIN nodes e3 ON site.row_id = e3.s AND e3.p = 'site_location'
            JOIN nodes g ON e3.o[1] = g.row_id
            WHERE s.otype = 'MaterialSampleRecord'
              AND event.otype = 'SamplingEvent'
              AND site.otype = 'SamplingSite'
              AND g.otype = 'GeospatialCoordLocation'
              AND g.latitude IS NOT NULL
        )
        SELECT * FROM validation_stats
    `;
    const data = await loadData(query, [], "loading_validation");
    return data;
}

Code

viewof validationTable = Inputs.table(locationValidation, {
    header: {
        path_type: "Query Path",
        sample_count: "Samples Found"
    },
    format: {
        sample_count: d => d.toLocaleString()
    }
})

🎉 Success! Both paths yield over 1M samples each.

9.4 Debugging Lessons Learned

Never assume direct relationships exist - always verify the graph structure first
Trace step-by-step - build from simple entity counts to complex joins
Test multiple paths - property graphs often have alternative routes
Validate results - sanity check your numbers against known entity counts

10 Performance & Optimization Strategies

10.1 Query Performance Guidelines

When working with this 11.6M row dataset:

Filter Early: Always apply otype filters first

-- Good: Reduces to ~1M rows immediately
WHERE otype = 'MaterialSampleRecord'

-- Avoid: Scans all 11M rows
WHERE label LIKE '%pottery%'

Use Views for Complex Patterns: Pre-compute common joins

CREATE VIEW samples_with_coords AS
SELECT ... -- complex join query

Leverage DuckDB’s Columnar Format: Aggregate before detailed analysis

10.2 Data Loading Strategies

For web applications:

Code

// Progressive loading pattern for large datasets
progressiveLoadExample = {
    // Start with aggregated overview
    const overview = await db.query(`
        SELECT
            ROUND(latitude/10)*10 as lat_bucket,
            ROUND(longitude/10)*10 as lon_bucket,
            COUNT(*) as point_count
        FROM nodes
        WHERE otype = 'GeospatialCoordLocation'
          AND latitude IS NOT NULL
        GROUP BY lat_bucket, lon_bucket
    `);

    return {
        strategy: "Progressive Loading",
        initial_points: overview.length,
        full_dataset: 198433,
        reduction_factor: Math.round(198433 / overview.length)
    };
}

Code

viewof loadStrategyDisplay = {
    const stats = await progressiveLoadExample;
    return html`<div style="padding: 1rem; background: #e0f2fe; border-radius: 8px;">
        <h4 style="margin-top: 0;">Loading Strategy Impact</h4>
        <p>Initial load: <strong>${stats.initial_points.toLocaleString()}</strong> aggregated points</p>
        <p>Full dataset: <strong>${stats.full_dataset.toLocaleString()}</strong> individual locations</p>
        <p>Reduction factor: <strong>${stats.reduction_factor}x</strong> faster initial load</p>
    </div>`;
}

11 Data Quality Metrics

Code

dataQuality = {
    const query = `
        WITH quality_checks AS (
            SELECT
                'Total Rows' as metric,
                COUNT(*) as value
            FROM nodes

            UNION ALL

            SELECT
                'Unique PIDs' as metric,
                COUNT(DISTINCT pid) as value
            FROM nodes

            UNION ALL

            SELECT
                'Samples with Direct Location' as metric,
                COUNT(DISTINCT s.row_id) as value
            FROM nodes s
            JOIN nodes e ON s.row_id = e.s AND e.p = 'sample_location'
            WHERE s.otype = 'MaterialSampleRecord'

            UNION ALL

            SELECT
                'Samples with Site Location' as metric,
                COUNT(DISTINCT s.row_id) as value
            FROM nodes s
            JOIN nodes e ON s.row_id = e.s AND e.p = 'produced_by'
            WHERE s.otype = 'MaterialSampleRecord'
        )
        SELECT * FROM quality_checks
    `;
    const data = await loadData(query, [], "loading_quality");
    return data;
}

Code

viewof qualityTable = Inputs.table(dataQuality, {
    header: {
        metric: "Quality Metric",
        value: "Count"
    },
    format: {
        value: d => d.toLocaleString()
    }
})

12 Archaeological Data Insights

12.1 Top Archaeological Sites by Sample Count

Code

topSitesByCount = {
    const query = `
        WITH sample_to_site AS (
            SELECT
                site.label as site_name,
                COUNT(DISTINCT samp.row_id) as sample_count
            FROM nodes samp
            JOIN nodes e1 ON samp.row_id = e1.s AND e1.p = 'produced_by'
            JOIN nodes event ON e1.o[1] = event.row_id
            JOIN nodes e2 ON event.row_id = e2.s AND e2.p = 'sampling_site'
            JOIN nodes site ON e2.o[1] = site.row_id
            WHERE samp.otype = 'MaterialSampleRecord'
              AND event.otype = 'SamplingEvent'
              AND site.otype = 'SamplingSite'
            GROUP BY site.label
        )
        SELECT * FROM sample_to_site
        ORDER BY sample_count DESC
        LIMIT 10
    `;
    const data = await loadData(query, [], "loading_top_sites");
    return data;
}

Code

viewof topSitesTable = Inputs.table(topSitesByCount, {
    header: {
        site_name: "Archaeological Site",
        sample_count: "Sample Count"
    },
    format: {
        sample_count: d => d.toLocaleString()
    }
})

12.2 Material Type Distribution

Code

materialDistribution = {
    const query = `
        SELECT
            mat.label as material_type,
            COUNT(DISTINCT samp.row_id) as sample_count
        FROM nodes samp
        JOIN nodes e ON samp.row_id = e.s AND e.p = 'has_material_category'
        JOIN nodes mat ON e.o[1] = mat.row_id
        WHERE samp.otype = 'MaterialSampleRecord'
          AND e.otype = '_edge_'
          AND mat.otype = 'IdentifiedConcept'
        GROUP BY mat.label
        ORDER BY sample_count DESC
        LIMIT 10
    `;
    const data = await loadData(query, [], "loading_materials");
    return data;
}

Code

viewof materialTable = Inputs.table(materialDistribution, {
    header: {
        material_type: "Material Type",
        sample_count: "Sample Count"
    },
    format: {
        sample_count: d => d.toLocaleString()
    }
})

Key Insights: - Çatalhöyük leads with 145,900+ samples - one of the world’s largest Neolithic sites - Biogenic non-organic materials dominate (bones, shells) reflecting archaeological preservation - Global coverage spans from Arctic (Finnmark) to temperate zones

13 Summary: Key Lessons for Querying OpenContext Parquet

13.1 🎯 Essential Discoveries

Critical Bug Fix: Direct Sample→Location queries don’t work
- Problem: Returned 0 results from 1M+ sample dataset
- Solution: Always traverse through SamplingEvent
- Impact: Unlocked access to 1,096,274 located samples

Correct Relationship Paths:

✅ Sample → produced_by → SamplingEvent → sample_location → Location
✅ Sample → produced_by → SamplingEvent → sampling_site → Site → site_location → Location

Property Graph Structure:
- 79% edges, 21% entities in 11.6M rows
- Multi-hop traversal required for meaningful queries
- No shortcuts exist - respect the graph model

13.2 🔧 Debugging Methodology

Verify relationships exist before building complex queries
Trace step-by-step from simple counts to complex joins
Test multiple paths - graphs often have alternative routes
Validate results against known entity counts

13.3 ⚡ Performance Guidelines

Filter by otype first - reduces 11M rows to manageable subsets
Use CTEs for complex multi-hop queries
Aggregate before filtering when possible
Respect obfuscated coordinates for site protection

13.4 🏛️ Archaeological Context

Major sites: Çatalhöyük, Petra, Polis Chrysochous dominate sample counts
Material types: Biogenic non-organic materials most common
Global reach: Arctic to Antarctic coverage with sensitive location protection
Research value: 1M+ precisely located specimens for spatial analysis

13.5 🚀 Advanced Applications

This corrected understanding enables: - Spatial clustering analysis of archaeological finds - Temporal pattern recognition through sampling events - Site similarity studies via material type distributions - Collection bias analysis through agent and responsibility networks

The key to success: Understand the graph model first, query second. This property graph structure reflects the real-world complexity of archaeological data collection and enables sophisticated analysis when queried correctly.

14 Next Steps

Ready to analyze this data? Remember: 1. Start with entity relationship exploration 2. Build queries incrementally 3. Validate results at each step 4. Respect archaeological site sensitivities

Happy querying! 🏺