IP Data Enrichment for Analytics: Warehouse, Batch, and Stream…

Why IP Enrichment Belongs in Your Warehouse, Not Just Your Application

Most teams enrich IP addresses at the application layer — checking geolocation during login, scoring risk at checkout, or appending country headers at the edge. This handles real-time decisions well, but it leaves your historical data un-enriched. When your analytics team wants to answer questions like "what share of fraud attempts came from VPN traffic last quarter" or "how does conversion rate vary by city," they hit a wall: the warehouse has raw IPs, not location or risk data.

Warehouse-side enrichment solves this by joining IP intelligence into your fact tables. Once enriched, every downstream analyst, dashboard, and ML model can filter and group by country, ISP, VPN status, ASN, threat level, and timezone — without calling an API at query time.

Three Enrichment Architecture Patterns

There is no single right pattern. The best choice depends on your data volume, freshness requirements, query patterns, and warehouse platform. Here are the three main approaches, from simplest to most sophisticated:

1Batch Enrichment: Load IP Intelligence into Your Warehouse

Export unique IP addresses from your warehouse, call the bulk geolocation API, and load the enriched results back as a dimension table. This pattern runs on a schedule (daily, hourly) and gives every downstream query access to location and risk data without API calls.

# Step 1: Extract unique IPs from your warehouse
# Snowflake example
COPY INTO @staging/ips_to_enrich.csv
FROM (
 SELECT DISTINCT ip_address
 FROM raw_events
 WHERE event_date >= DATEADD(day, -1, CURRENT_DATE())
 AND ip_address IS NOT NULL
)
FILE_FORMAT = (TYPE = 'CSV' FIELD_DELIMITER = ',')

# Step 2: Call the bulk geolocation API
# POST the CSV file and receive enriched results
curl -X POST "https://ip-info.app/api/v1/geolocate/bulk" \
 -H "x-api-key: YOUR_API_KEY" \
 -F "file=@ips_to_enrich.csv" \
 -o enriched_results.jsonl

# Step 3: Load enriched data back into Snowflake
CREATE OR REPLACE TABLE ip_intelligence_dim (
 ip_address STRING PRIMARY KEY,
 country_code STRING,
 city_name STRING,
 latitude FLOAT,
 longitude FLOAT,
 timezone STRING,
 asn INTEGER,
 aso STRING,
 is_vpn BOOLEAN,
 is_proxy BOOLEAN,
 is_tor BOOLEAN,
 threat_level STRING,
 accuracy_radius INTEGER,
 enriched_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP()
);

When to use: Daily or hourly enrichment of historical data. Works well when your warehouse is the single source of truth and analysts need to run ad-hoc queries against enriched IP data. Supports Snowflake, BigQuery, Redshift, Databricks, and any platform that can run SQL and import files.

2Stream Enrichment: Enrich in Real Time Before Warehouse Insert

Add an enrichment step in your streaming pipeline — Kafka, Kinesis, or Flink — that calls the IP intelligence API on each event before it lands in the warehouse. Events arrive already enriched, so no batch job is needed.

// Kafka Streams / Node.js: Enrich events before warehouse insert
const { Kafka } = require('kafkajs');
const axios = require('axios');

const kafka = new Kafka({ brokers: ['kafka:9092'] });
const consumer = kafka.consumer({ groupId: 'ip-enricher' });
const producer = kafka.producer();

const API_KEY = process.env.IP_INFO_API_KEY;
const API_BASE = 'https://ip-info.app/api/v1/geolocate';
const cache = new Map(); // In-memory cache for dedup

async function enrichIp(ip) {
 if (cache.has(ip)) return cache.get(ip);
 const { data } = await axios.get(`${API_BASE}/${ip}`, {
 headers: { 'x-api-key': API_KEY },
 timeout: 5000,
 });
 const enriched = {
 countryCode: data.countryCode,
 cityName: data.city?.name,
 asn: data.asn,
 aso: data.aso,
 isVpn: data.isVPN,
 isProxy: data.isProxy,
 threatLevel: data.threatLevel,
 timezone: data.city?.timeZone,
 };
 cache.set(ip, enriched);
 if (cache.size > 100_000) {
 // Evict oldest entries to bound memory
 const firstKey = cache.keys().next().value;
 cache.delete(firstKey);
 }
 return enriched;
}

async function run() {
 await consumer.connect();
 await producer.connect();
 await consumer.subscribe({ topic: 'raw-events' });

 for await (const message of consumer) {
 const event = JSON.parse(message.value);
 if (event.ipAddress) {
 try {
 event.ipIntel = await enrichIp(event.ipAddress);
 } catch {
 event.ipIntel = { error: 'enrichment_failed' };
 }
 }
 await producer.send({
 topic: 'enriched-events',
 messages: [{ value: JSON.stringify(event) }],
 });
 }
}

run().catch(console.error);

When to use: Real-time dashboards, fraud operations that need current data, or when your warehouse serves operational queries that should not wait for a batch cycle. Adds latency to the ingestion pipeline (one API call per unique IP), so cache aggressively.

3Query-Time Enrichment: Call the API from SQL (No Data Movement)

Some platforms support external functions or UDFs that call HTTP APIs during query execution. Snowflake External Functions, BigQuery Remote Functions, and Redshift Lambda UDFs all support this pattern. The enrichment happens at query time — no ETL, no batch schedule, no data movement.

-- Snowflake: External Function for IP geolocation
-- (Requires API Gateway or Azure Function as proxy)
CREATE OR REPLACE EXTERNAL FUNCTION
 enrich_ip(ip_address VARCHAR)
 RETURNS VARIANT
 API_INTEGRATION = ip_info_api
 AS 'https://your-gateway.example.com/ip-enrich';

-- Usage: enrich IPs at query time
SELECT
 e.event_id,
 e.ip_address,
 enrich_ip(e.ip_address):countryCode AS country,
 enrich_ip(e.ip_address):cityName AS city,
 enrich_ip(e.ip_address):asn AS asn,
 enrich_ip(e.ip_address):aso AS isp,
 enrich_ip(e.ip_address):isVpn AS is_vpn,
 enrich_ip(e.ip_address):threatLevel AS threat
FROM analytics.events e
WHERE e.event_date >= DATEADD(day, -7, CURRENT_DATE())
 AND enrich_ip(e.ip_address):isVpn = TRUE;

-- For production: cache results in a temp table
-- to avoid repeated API calls on the same IP
CREATE TEMPORARY TABLE ip_cache AS
SELECT DISTINCT ip_address, enrich_ip(ip_address) AS intel
FROM analytics.events
WHERE event_date >= DATEADD(day, -7, CURRENT_DATE());

When to use: Ad-hoc analysis where you do not want to maintain an ETL pipeline, or when you need to enrich a small subset of IPs on demand. Not suitable for large-scale recurring queries — the per-row API call cost and latency make batch or stream patterns more efficient at scale.

What You Can Build With Enriched IP Data

Once IP intelligence lives in your warehouse, it powers queries across fraud, analytics, compliance, and operations. Here are the highest-value use cases teams actually ship:

Fraud analysis by VPN and proxy status

Group transactions, login attempts, and account changes by VPN flag, proxy status, and threat level. Identify which regions, ISPs, and ASN ranges produce the most suspicious activity.

Geographic conversion and revenue analysis

Segment conversion rates, average order value, and churn by country, city, and timezone. Enriched IP data gives you geographic analytics without client-side tracking or cookies.

Network and ISP intelligence

Understand which ISPs and carriers your users connect through. Identify patterns like high churn from mobile carriers in specific regions or fraud clusters from particular ASN ranges.

Compliance and regulatory reporting

Filter transactions by country for PSD3 reporting, GDPR data residency checks, or sanctions screening. Build automated reports that map IP-based geographic data to regulatory requirements.

Suspicious traffic clustering

Use ASN, ISP, and threat-level fields to detect coordinated attacks. A sudden spike from a single ASN range, or multiple accounts sharing the same VPN exit node, signals organized fraud.

Content localization analytics

Measure how users in different regions interact with localized content. Compare engagement metrics across countries and timezones to optimize content strategy and regional campaigns.

dbt Transformation: Turning Raw Enrichment Into Analytics-Ready Models

Raw API response data needs cleaning before analysts can use it reliably. Here is a practical dbt model that transforms enriched IP data into a clean dimension table:

-- models/ip_intelligence_clean.sql
-- Cleans and deduplicates the raw enrichment data
WITH raw AS (
 SELECT * FROM {{ source('ip_intelligence', 'raw_enrichment') }}
),

deduplicated AS (
 SELECT
 ip_address,
 -- Country: prefer non-null, standardize codes
 COALESCE(NULLIF(country_code, ''), 'UNKNOWN') AS country_code,
 -- City: lowercase for consistency
 LOWER(NULLIF(city_name, '')) AS city_name,
 -- Coordinates: validate ranges
 CASE
 WHEN latitude BETWEEN -90 AND 90
 AND longitude BETWEEN -180 AND 180
 THEN latitude
 ELSE NULL
 END AS latitude,
 CASE
 WHEN latitude BETWEEN -90 AND 90
 AND longitude BETWEEN -180 AND 180
 THEN longitude
 ELSE NULL
 END AS longitude,
 -- Network intelligence
 asn,
 aso AS isp_name,
 -- Threat classification
 COALESCE(is_vpn, FALSE) AS is_vpn,
 COALESCE(is_proxy, FALSE) AS is_proxy,
 COALESCE(is_tor, FALSE) AS is_tor,
 COALESCE(NULLIF(threat_level, ''), 'low') AS threat_level,
 -- Metadata
 timezone,
 accuracy_radius,
 -- Keep the most recent enrichment
 ROW_NUMBER() OVER (
 PARTITION BY ip_address
 ORDER BY enriched_at DESC
 ) AS rn
 FROM raw
)

SELECT
 ip_address,
 country_code,
 city_name,
 latitude,
 longitude,
 asn,
 isp_name,
 is_vpn,
 is_proxy,
 is_tor,
 threat_level,
 timezone,
 accuracy_radius
FROM deduplicated
WHERE rn = 1

Choosing the Right Pattern

Each pattern has trade-offs. Use this comparison to pick the approach that fits your volume, latency, and cost constraints:

Criterion	Batch	Stream	Query-Time
Data freshness	Hourly to daily	Real-time (seconds)	Always current
Cost efficiency	Best — deduplicates naturally	Good — needs in-memory cache	Worst — repeated API calls
Infrastructure complexity	Low — SQL + cron	High — streaming stack	Medium — external function setup
Best for	Historical analysis, reporting	Real-time dashboards, fraud ops	Ad-hoc investigation
Scale limit	Millions of IPs per cycle	Thousands per second (cached)	Hundreds per query

Start Enriching Your Warehouse With IP Intelligence

Test the API with a sample of IPs from your production data. The live demo returns country, city, ASN, ISP, VPN status, and threat level — the same fields you would load into your warehouse.

Test the Live Demo View API Documentation View Pricing Plans

Continue Reading

Building IP Intelligence Data Pipelines for Analytics and SIEM →IP Intelligence at the Edge: Real-Time Risk Decisions →IP Data Enrichment for CRM: Turn Every Visitor into a Qualified Lead →IP Intelligence for AdTech: Detecting Invalid Traffic Without Cookies →Data Cleansing Use Case: Enrich and Standardize IP Records →

IP Data Enrichment for Analytics: Warehouse, Batch, and Stream Patterns

IP Enrichment for Analytics: Key Patterns