Methodology

Methodology and Data Quality Control

Local Data Insights does not publish simple raw lists. The data undergoes a cleaning, normalization, geo-validation, and digital signal verification pipeline before being published as a dataset.

This page explains the principles of the LDI methodology, without exposing the detailed internal rules, full dictionaries, or proprietary technical logic.

What LDI Data Reflects

LDI data reflects the visible digital part of a local market, based on the publicly available information at the time of the snapshot.

LDI datasets are not official registries and do not guarantee full coverage of all organizations in a market. They provide a structured view of businesses visible in public digital sources.

ETL Process and Data Pipeline: Bronze → Silver → Gold

LDI uses an internal ETL process — data collection, transformation, and controlled publishing — built on a Bronze → Silver → Gold architecture. The data goes through multiple levels to separate raw information from clean, verifiable data ready for analysis.

Bronze

The Bronze level contains raw data observed in public digital sources. At this stage, there may be duplicates, unclear categories, incomplete addresses, or records that require further verification.

Silver

In the Silver level, the data is cleaned, normalized, and filtered. Exclusion rules, category dictionaries, address checks, status, location, and digital signals are applied.

Gold

The Gold level contains data ready for publication: records that pass quality checks, geo-validation, and consistency of the main fields.

Data Cleaning, Categories, and Noise Elimination

An important part of the LDI methodology is separating relevant businesses from records that do not belong to the analyzed vertical or cannot be used reliably.

Geo-gate and Spatial Validation

For the data to be used in local analysis and geo-analysis, LDI applies geographic checks before publishing.

Coordinates and Location

We use geographic coordinates when available and verifiable in the dataset context.

Analyzed Area

Points outside the analyzed geography are checked and can be excluded if they do not belong to the defined local market.

Usable Addresses

Records without usable location can be eliminated from datasets where spatial analysis is important.

Website and Digital Signal Verification

LDI treats websites and social media channels as observable digital signals, not just as simple text fields.

Pre-publication Checks

Before a dataset is published, multiple internal quality checks are applied.

Q1 — Field Consistency

We check whether the main fields required for analysis are present and consistent: name, category, location, status, and identifiers.

Q2 — Location Quality

We check addresses, cities, counties, and coordinates to reduce geographical errors and points outside the analyzed area.

Q3 — Digital Signals

We check website availability, contacts, and observable digital signals before publication.

What Ends Up in the Published Dataset

The published dataset does not include all the records observed initially. In the Gold level, only data that passes relevance, location, and consistency filters are included.

Formats and Usage

LDI commercial datasets are prepared for use in analysis tools, spreadsheets, and operational workflows.

CSV

A format suitable for import, analysis, automation, and BI tools. Files are saved in UTF-8 encoding.

XLSX

A format suitable for direct work in Excel, including fields where leading zeros need to be preserved.

Local Analysis

The location fields and coordinates, when available, allow the data to be used for maps and spatial exploration.

What LDI Is Not

For a correct interpretation, it is important to clarify what these datasets do not represent.

Not an Official Registry

The data does not replace official company or institution registries.

Not a Quality Assessment

Digital indicators do not indicate whether a business is good, poor, or recommended.

Does Not Guarantee Full Coverage

The datasets reflect the visible market in public digital sources, not the absolute entirety of the market.

Limitations and Responsibility

LDI data reflects information available in public digital sources at the time of the snapshot. It is not an official registry and does not guarantee full coverage of all organizations in a market. Public sources may change over time, and the use of the data must comply with applicable legislation.

Explore the Data

Check out the published datasets, view samples, or request a custom dataset for a vertical, city, or county that interests you.