Methodology

Methodology and Data Quality Control

Local Data Insights publishes datasets that are carefully processed and verified. Our data undergoes thorough cleaning, normalization, geo-validation, and digital signal verification to ensure its accuracy and reliability.

This page provides an overview of the LDI methodology, emphasizing transparency in our data processing while maintaining proprietary internal rules and logic.

What LDI Data Reflects

LDI data provides a structured view of the visible digital landscape of a local market, based on publicly available information at the time of the snapshot.

LDI datasets are based on publicly available digital sources and do not represent official registries. While they offer a comprehensive view of businesses actively visible online, they may not cover all organizations in a market.

ETL Process and Data Pipeline: Bronze → Silver → Gold

LDI uses an internal ETL (Extract, Transform, Load) process — data collection, transformation, and controlled publishing — built on a Bronze → Silver → Gold architecture. Each data stage involves refining raw information into clean, verifiable datasets that are ready for detailed analysis.

Bronze

The Bronze level contains raw data extracted from public digital sources. At this stage, data may contain duplicates, ambiguous categories, incomplete addresses, or other inconsistencies that require further processing to ensure accuracy and consistency.

Silver

At the Silver level, data undergoes cleaning, normalization, and standardization. This includes applying exclusion rules, refining category dictionaries, validating addresses, and verifying the status and location of businesses. The goal is to prepare data for accurate analysis and reporting.

Gold

The Gold level contains fully verified datasets, with all records passing strict quality checks, geo-validation, and ensuring consistency in key fields. These datasets are ready for publication and can be used for in-depth market analysis, decision-making, and reporting.

Pre-publication Data Checks

Before publishing any dataset, it goes through three essential levels of validation to ensure its technical accuracy, data quality, and readiness for analysis.

1st Level — Technical Check

We check the dataset for technical issues, such as missing fields, errors in categorization, and data formatting problems that could affect the usability of the data.

2nd Level — Quality Check

At this stage, we validate the accuracy, completeness, and consistency of the data. We ensure that all key fields such as name, category, location, and contact details are reliable and correct.

3rd Level — Data Packaging

In the final stage, we organize and format the data, ensuring it is well-structured and ready for publication, making it easy to use in reports, analysis, and decision-making.

What Gets Included in the Published Dataset

The published dataset only includes data that passes a series of filters at the Gold level, ensuring it is relevant, accurate, and reliable for analysis.

What LDI Datasets Are Not

To ensure correct interpretation, it is important to clarify what these datasets do not represent.

Not an Official Registry

These datasets do not replace official business or institution registries.

Not a Quality Assessment

Digital indicators do not assess whether a business is good or bad.

Does Not Guarantee Full Coverage

The datasets only reflect the visible market in public digital sources, not the entire market.

Explore the Data

Discover our published datasets, view sample data, or request a custom dataset for a specific vertical, city, or county that meets your needs.