Snowflake
Monitoring Kit · v2

12 metrics. Every Snowflake
account. One setup.

Every Snowflake account has the same INFORMATION_SCHEMA and ACCOUNT_USAGE tables. These 12 queries turn them into live alerts with smart comparison — each metric checks against a 30-day rolling baseline so you only get paged when something is genuinely wrong.

Start monitoring free →
12
Production-ready metrics
4
Categories covered
30-day
Smart comparison baseline
10 min
From connect to first alert
Snowflake Kit · 12 metrics · Live
MetricCategorySchedule
Warehouse Credit Spike — Last 24h
CostHourly
Serverless Credit Spike — Last 24h
Cost12h
Database Storage Spike — Last 7d
CostDaily
Failsafe Storage Spike — Last 7d
CostDaily
Serverless Service Credit Spike — Yesterday
CostDaily
Long-Running Queries — Last 1h
PerformanceHourly
Query Queue Anomaly — Last 1h
PerformanceHourly
Remote Disk Spill Anomaly — Yesterday
PerformanceDaily
ETL Failure Rate Spike — Last 4h
FailuresHourly
Failed Task — Last 1h
FailuresHourly
Stale Tables — 90-Day No Query
OptimizationWeekly
Warehouse Utilization — Yesterday
OptimizationDaily

Permissions

3 read-only grants. That's it.

Lighthouse only reads metadata — query history, metering stats, storage usage. It cannot access your actual table data, run DML, create objects, or modify anything.

ACCOUNT_USAGE Access

One-time grant. All future activity appears automatically.

GRANT IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE TO ROLE <your_role>;
  • Storage history
  • Serverless costs
  • Disk spill data
  • Access history
  • Warehouse metering history
  • Historical query data
  • New users, tables & warehouses — auto-included forever

Real-Time Warehouse Access

Covers all current and future warehouses — no need to re-run when new warehouses are added.

GRANT MONITOR ON ALL WAREHOUSES TO ROLE <your_role>;
GRANT MONITOR ON FUTURE WAREHOUSES TO ROLE <your_role>;
  • INFORMATION_SCHEMA.query_history
  • INFORMATION_SCHEMA.warehouse_metering_history
  • INFORMATION_SCHEMA.task_history

Account-Level Monitoring

One-time grant.

GRANT MONITOR USAGE ON ACCOUNT TO ROLE <your_role>;
  • warehouse_load_history (utilization metric)
  • Account-scoped — covers all current and future activity

Read-only, always → All three grants are SELECT-only metadata privileges. Run them once and you're done — new warehouses, users, tables, and activity are automatically covered. Lighthouse cannot write data, drop tables, create warehouses, or access the contents of your tables — only operational metadata like query durations, credit consumption, and storage sizes.

How it works

We configure it. You get the alerts.

01

Connect Snowflake

Read-only credentials, 3 clicks. Lighthouse reads your schema — never your raw data. Takes under 5 minutes.

02

Deploy the kit

All 12 monitoring metrics go live in your Lighthouse workspace — pre-configured, pre-tuned, with 30-day smart comparison baselines. No SQL required.

03

Get Slack alerts

When cost spikes, queries fail, or warehouses misbehave, you get a Slack message with value, baseline, and context — before anyone has to ask.

Time windows explained → Real-time metrics use INFORMATION_SCHEMA with a -2h to -1h window to absorb Snowflake's 30–45 min propagation lag. Historical metrics use ACCOUNT_USAGE with a -30h to -6h window so data is fully populated. No partial reads, no false drops.

💰

Cost & Credits

5 metrics
01 / COSTReal-timeHourly

Warehouse Credit Spike — Last 24h

Warehouse spend exceeds the 30-day daily max by more than 30%.

Why monitor

  • Compares today's credits per warehouse against a 30-day rolling baseline — alerts only when spend is genuinely anomalous, not just on normally busy days.
  • Uses INFORMATION_SCHEMA for real-time data — no 45-minute ACCOUNT_USAGE lag, catches runaway warehouses within the hour.
  • Per-warehouse segmentation isolates which workload (ETL, BI, ad-hoc) is responsible for the spike.
SQL · Snowflake
WITH
    "main_metric" AS (
      SELECT
        warehouse_name,
        SUM(credits_used)                AS "METRIC_VALUE",
        SUM(credits_used_compute)        AS compute_credits,
        SUM(credits_used_cloud_services) AS cloud_credits
      FROM TABLE(information_schema.warehouse_metering_history(
        dateadd('hours', -25, current_timestamp()),
        dateadd('hours', -1,  current_timestamp())
      ))
      GROUP BY warehouse_name
    ),

    "Compared Trend 1_time_windows" AS (
      SELECT
        DATEADD('day', -SEQ4() - 1, current_timestamp())            AS "WINDOW_END",
        DATEADD('hours', -24, DATEADD('day', -SEQ4() - 1,
          current_timestamp()))                                          AS "WINDOW_START"
      FROM TABLE(GENERATOR(ROWCOUNT => 30))
    ),

    "Compared Trend 1_metric_per_time_window" AS (
      SELECT
        h.warehouse_name,
        t."WINDOW_END",
        SUM(h.credits_used) AS "METRIC_VALUE"
      FROM TABLE(information_schema.warehouse_metering_history(
        dateadd('day', -31, current_timestamp()),
        dateadd('hours', -1,  current_timestamp())
      )) h
      JOIN "Compared Trend 1_time_windows" t
        ON t."WINDOW_START" <= h.start_time
       AND t."WINDOW_END"   >= h.start_time
      GROUP BY h.warehouse_name, t."WINDOW_END"
    ),

    "Compared Trend 1" AS (
      SELECT
        warehouse_name,
        MAX("METRIC_VALUE")              AS "COMPARED_VALUE",
        MAX("METRIC_VALUE") * (1 + 0.3) AS "COMPARED_VALUE_THRESHOLD"
      FROM "Compared Trend 1_metric_per_time_window"
      GROUP BY warehouse_name
    )

SELECT
    m.warehouse_name,
    m."METRIC_VALUE",
    m.compute_credits,
    m.cloud_credits,
    c."COMPARED_VALUE",
    c."COMPARED_VALUE_THRESHOLD",
    CASE WHEN m."METRIC_VALUE" > c."COMPARED_VALUE_THRESHOLD"
      THEN 'SPIKE' ELSE 'Normal'
    END AS status
FROM "main_metric" m
LEFT JOIN "Compared Trend 1" c ON m.warehouse_name = c.warehouse_name
ORDER BY m."METRIC_VALUE" DESC

Sample alert

LH
LighthouseToday at 9:41 AMAPP

⚠️ Warehouse Credit Spike — COMPUTE_WH used 12.4 credits in the last 24h. 30-day max: 3.1 credits · 4× above baseline.

02 / COSTHistorical12h

Serverless Credit Spike — Last 24h

Serverless task credit spend per task exceeds the 30-day daily max by more than 30%.

Why monitor

  • Serverless task costs don't appear under any warehouse — they accumulate silently and show up as a surprise line on the bill.
  • Segmented per task so you know exactly which task spiked, not just that serverless spend is up at the account level.
  • Compares against 30-day daily baseline per task — only fires when a specific task's spend is genuinely out of pattern.
SQL · Snowflake
WITH
    "main_metric" AS (
      SELECT
        database_name, schema_name, task_name,
        SUM(credits_used) AS "METRIC_VALUE"
      FROM SNOWFLAKE.ACCOUNT_USAGE.SERVERLESS_TASK_HISTORY
      WHERE start_time >= dateadd('hours', -6, current_timestamp()) - INTERVAL '24 hours'
        AND start_time <= dateadd('hours', -6, current_timestamp())
      GROUP BY database_name, schema_name, task_name
    ),
    "Compared Trend 1_time_windows" AS (
      SELECT
        DATEADD('day', -SEQ4() - 1, dateadd('hours', -6, current_timestamp())) AS "WINDOW_END",
        DATEADD('hours', -24, DATEADD('day', -SEQ4() - 1,
          dateadd('hours', -6, current_timestamp())))                          AS "WINDOW_START"
      FROM TABLE(GENERATOR(ROWCOUNT => 30))
    ),
    "Compared Trend 1_metric_per_time_window" AS (
      SELECT
        h.database_name, h.schema_name, h.task_name, t."WINDOW_END",
        SUM(h.credits_used) AS "METRIC_VALUE"
      FROM SNOWFLAKE.ACCOUNT_USAGE.SERVERLESS_TASK_HISTORY h
      JOIN "Compared Trend 1_time_windows" t
        ON t."WINDOW_START" <= h.start_time AND t."WINDOW_END" >= h.start_time
      GROUP BY h.database_name, h.schema_name, h.task_name, t."WINDOW_END"
    ),
    "Compared Trend 1" AS (
      SELECT
        database_name, schema_name, task_name,
        MAX("METRIC_VALUE")              AS "COMPARED_VALUE",
        MAX("METRIC_VALUE") * (1 + 0.3) AS "COMPARED_VALUE_THRESHOLD"
      FROM "Compared Trend 1_metric_per_time_window"
      GROUP BY database_name, schema_name, task_name
    )
SELECT
    m.database_name, m.schema_name, m.task_name,
    m."METRIC_VALUE",
    c."COMPARED_VALUE", c."COMPARED_VALUE_THRESHOLD"
FROM "main_metric" m
LEFT JOIN "Compared Trend 1" c
    ON m.database_name = c.database_name
   AND m.schema_name   = c.schema_name
   AND m.task_name     = c.task_name

Sample alert

LH
LighthouseToday at 9:41 AMAPP

⚠️ Serverless Credit Spike — MY_DB.TRANSFORMS.NIGHTLY_AGG used 8.2 credits in the last 24h. 30-day max: 1.4 credits · 5.8× above baseline.

03 / COSTHistoricalDaily

Database Storage Spike — Last 7d

Database storage exceeds the 30-day weekly max by more than 30%. Value in bytes.

Why monitor

  • Compares last 7 days of storage against 30 days of weekly windows — catches structural growth, not just day-to-day table churn noise.
  • Per-database segmentation pinpoints which database is growing, not just the account total.
  • Storage creeps up silently — a bad pipeline can add terabytes before it shows on the bill.
SQL · Snowflake
WITH
    "main_metric" AS (
      SELECT
        database_id,
        AVG(average_database_bytes) AS "METRIC_VALUE"
      FROM SNOWFLAKE.ACCOUNT_USAGE.DATABASE_STORAGE_USAGE_HISTORY
      WHERE usage_date >= (current_timestamp() - INTERVAL '7 days')::date
        AND usage_date <  current_date()
        AND average_database_bytes > 0
      GROUP BY database_id
    ),
    "Compared Trend 1_time_windows" AS (
      SELECT
        DATEADD('week', -SEQ4() - 1, current_timestamp())              AS "WINDOW_END",
        DATEADD('days', -7, DATEADD('week', -SEQ4() - 1,
          current_timestamp()))                                            AS "WINDOW_START"
      FROM TABLE(GENERATOR(ROWCOUNT => 6))
    ),
    "Compared Trend 1_metric_per_time_window" AS (
      SELECT
        h.database_id, t."WINDOW_END",
        AVG(h.average_database_bytes) AS "METRIC_VALUE"
      FROM SNOWFLAKE.ACCOUNT_USAGE.DATABASE_STORAGE_USAGE_HISTORY h
      JOIN "Compared Trend 1_time_windows" t
        ON t."WINDOW_START"::date <= h.usage_date
       AND t."WINDOW_END"::date   >= h.usage_date
      WHERE h.average_database_bytes > 0
      GROUP BY h.database_id, t."WINDOW_END"
    ),
    "Compared Trend 1" AS (
      SELECT
        database_id,
        MAX("METRIC_VALUE")              AS "COMPARED_VALUE",
        MAX("METRIC_VALUE") * (1 + 0.3) AS "COMPARED_VALUE_THRESHOLD"
      FROM "Compared Trend 1_metric_per_time_window"
      GROUP BY database_id
    )
SELECT
    m.database_id,
    m."METRIC_VALUE",
    c."COMPARED_VALUE", c."COMPARED_VALUE_THRESHOLD"
FROM "main_metric" m
LEFT JOIN "Compared Trend 1" c ON m.database_id = c.database_id

Sample alert

LH
LighthouseToday at 9:41 AMAPP

⚠️ Database Storage Spike — ANALYTICS_DB averaged 18.2 GB over the last 7 days. 30-day weekly max: 3.1 GB · 5.9× above baseline.

04 / COSTHistoricalDaily

Failsafe Storage Spike — Last 7d

Failsafe storage exceeds the 30-day weekly max by more than 30%. Value in bytes.

Why monitor

  • Failsafe is mandatory and fully billable — you can't turn it off, so fast growth means real money with no way to stop it mid-flight.
  • Rapid failsafe growth = high table churn — usually a pipeline recreating large tables on every run instead of doing incremental updates.
  • Same comparison pattern as regular storage so you get consistent, noise-free alerting for both signals.
SQL · Snowflake
WITH
    "main_metric" AS (
      SELECT
        database_id,
        AVG(average_failsafe_bytes) AS "METRIC_VALUE"
      FROM SNOWFLAKE.ACCOUNT_USAGE.DATABASE_STORAGE_USAGE_HISTORY
      WHERE usage_date >= (current_timestamp() - INTERVAL '7 days')::date
        AND usage_date <  current_date()
        AND average_failsafe_bytes > 0
      GROUP BY database_id
    ),
    "Compared Trend 1_time_windows" AS (
      SELECT
        DATEADD('week', -SEQ4() - 1, current_timestamp())              AS "WINDOW_END",
        DATEADD('days', -7, DATEADD('week', -SEQ4() - 1,
          current_timestamp()))                                            AS "WINDOW_START"
      FROM TABLE(GENERATOR(ROWCOUNT => 6))
    ),
    "Compared Trend 1_metric_per_time_window" AS (
      SELECT
        h.database_id, t."WINDOW_END",
        AVG(h.average_failsafe_bytes) AS "METRIC_VALUE"
      FROM SNOWFLAKE.ACCOUNT_USAGE.DATABASE_STORAGE_USAGE_HISTORY h
      JOIN "Compared Trend 1_time_windows" t
        ON t."WINDOW_START"::date <= h.usage_date
       AND t."WINDOW_END"::date   >= h.usage_date
      WHERE h.average_failsafe_bytes > 0
      GROUP BY h.database_id, t."WINDOW_END"
    ),
    "Compared Trend 1" AS (
      SELECT
        database_id,
        MAX("METRIC_VALUE")              AS "COMPARED_VALUE",
        MAX("METRIC_VALUE") * (1 + 0.3) AS "COMPARED_VALUE_THRESHOLD"
      FROM "Compared Trend 1_metric_per_time_window"
      GROUP BY database_id
    )
SELECT
    m.database_id,
    m."METRIC_VALUE",
    c."COMPARED_VALUE", c."COMPARED_VALUE_THRESHOLD"
FROM "main_metric" m
LEFT JOIN "Compared Trend 1" c ON m.database_id = c.database_id

Sample alert

LH
LighthouseToday at 9:41 AMAPP

⚠️ Failsafe Storage Spike — ANALYTICS_DB: 94 GB in failsafe. 30-day weekly max: 23 GB · Tables with frequent overwrites are driving retention costs.

05 / COSTHistoricalDaily

Serverless Service Credit Spike — Yesterday

Credits consumed by any non-warehouse Snowflake service yesterday exceed the 30-day daily max by more than 20%, broken down by service type. Covers Cortex AI features, Snowflake Intelligence agents, query acceleration, auto-clustering, pipes, search optimization, and container services.

Why monitor

  • Warehouse credits are already covered by Q1 — this catches everything else in one metric: Cortex AI features, Snowflake Intelligence agents, query acceleration, auto-clustering, pipes, search optimization, and container services.
  • Segmented by SERVICE_TYPE so each service fires independently — a spike in SNOWFLAKE_INTELLIGENCE (AI agents) won't be masked by normal QUERY_ACCELERATION or AUTO_CLUSTERING activity.
  • Cortex and AI services are the fastest-growing surprise line items — CORTEX_CODE_SNOWSIGHT (analysts clicking Copilot in the UI) and SNOWFLAKE_INTELLIGENCE (agent invocations) bill silently outside any warehouse and never appear in warehouse credit reports.
SQL · Snowflake
SELECT * FROM (
  WITH
    "main_metric" AS (
      SELECT
        service_type,
        SUM(credits_used) AS "METRIC_VALUE"
      FROM SNOWFLAKE.ACCOUNT_USAGE.METERING_HISTORY
      WHERE service_type NOT IN ('WAREHOUSE_METERING', 'SERVERLESS_TASK')
        AND start_time >= DATEADD('day', -1, CURRENT_DATE())
        AND start_time <  CURRENT_DATE()
      GROUP BY service_type
    ),

    "Compared Trend 1_time_windows" AS (
      SELECT
        DATEADD('day', -SEQ4() - 2, CURRENT_DATE()) AS "WINDOW_DATE"
      FROM TABLE(GENERATOR(ROWCOUNT => 30))
    ),

    "Compared Trend 1_metric_per_time_window" AS (
      SELECT
        h.service_type,
        t."WINDOW_DATE",
        SUM(h.credits_used) AS "METRIC_VALUE"
      FROM SNOWFLAKE.ACCOUNT_USAGE.METERING_HISTORY h
      JOIN "Compared Trend 1_time_windows" t
        ON h.start_time::date = t."WINDOW_DATE"
      WHERE h.service_type NOT IN ('WAREHOUSE_METERING', 'SERVERLESS_TASK')
      GROUP BY h.service_type, t."WINDOW_DATE"
    ),

    "Compared Trend 1" AS (
      SELECT
        service_type,
        MAX("METRIC_VALUE")              AS "COMPARED_VALUE",
        MAX("METRIC_VALUE") * (1 + 0.2) AS "COMPARED_VALUE_THRESHOLD"
      FROM "Compared Trend 1_metric_per_time_window"
      GROUP BY service_type
    )

SELECT
    m.service_type,
    m."METRIC_VALUE",
    c."COMPARED_VALUE",
    c."COMPARED_VALUE_THRESHOLD"
FROM "main_metric" m
LEFT JOIN "Compared Trend 1" c ON m.service_type = c.service_type
ORDER BY m."METRIC_VALUE" DESC
) AS subquery
LIMIT 1000

Sample alert

LH
LighthouseToday at 9:41 AMAPP

⚠️ Serverless Service Credit Spike
*Service*: CORTEX_CODE_SNOWSIGHT
*Credits*: 84.3
*Compared to*: 11.2
*Threshold*: 13.4

Performance

3 metrics
06 / PERFReal-timeHourly

Long-Running Queries — Last 1h

Queries exceeding 30 minutes of execution time in the last hour.

Why monitor

  • 30 minutes is the threshold — anything above that is either a regression or a one-off heavy query worth knowing about.
  • Returns QUERY_ID per row so you can pull the full query profile from Snowflake and pinpoint the bottleneck.
  • INFORMATION_SCHEMA, no lag — alerts fire within the hour, not the next morning.
SQL · Snowflake
SELECT
    query_id,
    warehouse_name,
    user_name,
    ROUND(total_elapsed_time / 1000 / 60, 1) AS duration_minutes
FROM TABLE(information_schema.query_history(
    dateadd('hours', -2, current_timestamp()),
    dateadd('hours', -1, current_timestamp())
))
WHERE total_elapsed_time > 1800000        -- 30 minutes in ms
  AND query_type != 'MULTI_STATEMENT'     -- exclude transaction wrappers (counted via their child queries)
ORDER BY total_elapsed_time DESC

Sample alert

LH
LighthouseToday at 9:41 AMAPP

⚠️ Long-Running Query — 3 queries exceeded 30 min in the last hour. Longest: 47 min on ANALYTICS_WH by user ETL_SERVICE.

07 / PERFReal-timeHourly

Query Queue Anomaly — Last 1h

Total queue wait time per warehouse exceeds the 30-day hourly max by more than 30%.

Why monitor

  • Total wait, not per-query max — measures total minutes all queries spent waiting on a warehouse in the hour. Catches congestion events (many queries waiting 5 min each) that per-query max would miss.
  • Uses ACCOUNT_USAGE for both current and history — information_schema only shows the current role's queries, which would silently miss most activity.
  • Direct signal for right-sizing: persistent queue anomalies = scale up or split the workload.
SQL · Snowflake
WITH
    "main_metric" AS (
      SELECT
        warehouse_name,
        ROUND(SUM(queued_overload_time) / 1000 / 60, 2) AS "METRIC_VALUE"
      FROM snowflake.account_usage.query_history
      WHERE start_time >= dateadd('hours', -2, current_timestamp())
        AND start_time <  dateadd('hours', -1, current_timestamp())
        AND queued_overload_time > 0
      GROUP BY warehouse_name
    ),
    "history" AS (
      SELECT warehouse_name, start_time, queued_overload_time
      FROM snowflake.account_usage.query_history
      WHERE start_time >= dateadd('day', -31, current_timestamp())
        AND start_time <  dateadd('hours', -1, current_timestamp())
        AND queued_overload_time > 0
    ),
    "Compared Trend 1_time_windows" AS (
      SELECT
        DATEADD('day', -SEQ4() - 1, current_timestamp())          AS "WINDOW_END",
        DATEADD('hours', -24, DATEADD('day', -SEQ4() - 1,
          current_timestamp()))                                      AS "WINDOW_START"
      FROM TABLE(GENERATOR(ROWCOUNT => 30))
    ),
    "Compared Trend 1_metric_per_time_window" AS (
      SELECT
        h.warehouse_name, t."WINDOW_END",
        ROUND(SUM(h.queued_overload_time) / 1000 / 60, 2) AS "METRIC_VALUE"
      FROM "history" h
      JOIN "Compared Trend 1_time_windows" t
        ON t."WINDOW_START" <= h.start_time AND t."WINDOW_END" >= h.start_time
      GROUP BY h.warehouse_name, t."WINDOW_END"
    ),
    "Compared Trend 1" AS (
      SELECT
        warehouse_name,
        MAX("METRIC_VALUE")              AS "COMPARED_VALUE",
        MAX("METRIC_VALUE") * (1 + 0.3) AS "COMPARED_VALUE_THRESHOLD"
      FROM "Compared Trend 1_metric_per_time_window"
      GROUP BY warehouse_name
    )
SELECT
    m.warehouse_name,
    m."METRIC_VALUE", c."COMPARED_VALUE", c."COMPARED_VALUE_THRESHOLD"
FROM "main_metric" m
LEFT JOIN "Compared Trend 1" c ON m.warehouse_name = c.warehouse_name
WHERE m."METRIC_VALUE" > c."COMPARED_VALUE_THRESHOLD"
ORDER BY m."METRIC_VALUE" DESC

Sample alert

LH
LighthouseToday at 9:41 AMAPP

⚠️ Query Queue Anomaly — 14 queries queued in the last hour on REPORTING_WH. 30-day hourly max: 2 · 7× above baseline.

08 / PERFHistoricalDaily

Remote Disk Spill Anomaly — Yesterday

A query's remote disk spill exceeds the 30-day daily max by more than 30%.

Why monitor

  • Remote spill is 10–100× slower than in-memory — a spilling query burns proportionally more credits for the same output.
  • Uses ACCOUNT_USAGE because bytes_spilled columns don't exist in INFORMATION_SCHEMA — daily run is appropriate since spill isn't a real-time emergency.
  • Compares per-query spill against the 30-day per-warehouse max — only fires when a query is spilling more than anything seen recently.
SQL · Snowflake
WITH
    "main_metric" AS (
      SELECT
        query_id, warehouse_name, user_name,
        ROUND(bytes_spilled_to_remote_storage / 1e9, 3) AS "METRIC_VALUE",
        ROUND(bytes_spilled_to_local_storage  / 1e9, 3) AS spilled_local_gb
      FROM snowflake.account_usage.query_history
      WHERE start_time >= dateadd('day', -1, current_date())
        AND start_time <  current_date()
        AND bytes_spilled_to_remote_storage > 0
    ),
    "history" AS (
      SELECT warehouse_name, start_time, bytes_spilled_to_remote_storage
      FROM snowflake.account_usage.query_history
      WHERE start_time >= dateadd('day', -31, current_date())
        AND start_time <  dateadd('day', -1, current_date())
        AND bytes_spilled_to_remote_storage > 0
    ),
    "Compared Trend 1_time_windows" AS (
      SELECT
        DATEADD('day', -SEQ4() - 1, current_date())              AS "WINDOW_END",
        DATEADD('day', -1, DATEADD('day', -SEQ4() - 1,
          current_date()))                                            AS "WINDOW_START"
      FROM TABLE(GENERATOR(ROWCOUNT => 30))
    ),
    "Compared Trend 1_metric_per_time_window" AS (
      SELECT
        h.warehouse_name, t."WINDOW_END",
        MAX(ROUND(h.bytes_spilled_to_remote_storage / 1e9, 3)) AS "METRIC_VALUE"
      FROM "history" h
      JOIN "Compared Trend 1_time_windows" t
        ON t."WINDOW_START" <= h.start_time::date
       AND t."WINDOW_END"   >= h.start_time::date
      GROUP BY h.warehouse_name, t."WINDOW_END"
    ),
    "Compared Trend 1" AS (
      SELECT
        warehouse_name,
        MAX("METRIC_VALUE")              AS "COMPARED_VALUE",
        MAX("METRIC_VALUE") * (1 + 0.3) AS "COMPARED_VALUE_THRESHOLD"
      FROM "Compared Trend 1_metric_per_time_window"
      GROUP BY warehouse_name
    )
SELECT
    m.query_id, m.warehouse_name, m.user_name,
    m."METRIC_VALUE", m.spilled_local_gb,
    c."COMPARED_VALUE", c."COMPARED_VALUE_THRESHOLD"
FROM "main_metric" m
LEFT JOIN "Compared Trend 1" c ON m.warehouse_name = c.warehouse_name
WHERE m."METRIC_VALUE" > c."COMPARED_VALUE_THRESHOLD"
ORDER BY m."METRIC_VALUE" DESC

Sample alert

LH
LighthouseToday at 9:41 AMAPP

⚠️ Remote Disk Spill — 12.4 GB spilled to remote storage yesterday. 30-day max: 0.3 GB · TRANSFORM_WH queries are undersized for this workload.

Want Slack alerts when any of these fire?

Connect Snowflake to Lighthouse in 10 minutes — free, read-only, no SQL required.

Start free →
🔴

Failures

2 metrics
09 / FAILReal-timeHourly

ETL Failure Rate Spike — Last 4h

Pipeline query failure rate exceeds the 7-day daily max by more than 30%. Aggregates over a 4-hour window to avoid noise from low-volume hours. Filters to ETL types only: INSERT, MERGE, COPY, UPDATE, DELETE, CTAS.

Why monitor

  • Failure rate, not count — 2 failures out of 1,000 ETL queries is fine; 2 out of 3 is a disaster. Rate is the right signal.
  • ETL types only — syntax errors from ad-hoc queries and dev work are excluded so the alert only fires on pipeline breakage.
  • Uses ACCOUNT_USAGE for both current and history — information_schema only shows the current role's queries and silently misses most ETL activity.
SQL · Snowflake
WITH
    "all_queries" AS (
      SELECT warehouse_name, execution_status
      FROM snowflake.account_usage.query_history
      WHERE start_time >= DATEADD('hours', -5, CURRENT_TIMESTAMP())
        AND start_time <  DATEADD('hours', -1, CURRENT_TIMESTAMP())
        AND query_type IN (
          'INSERT', 'MERGE', 'COPY', 'UPDATE', 'DELETE', 'CREATE_TABLE_AS_SELECT'
        )
    ),
    "main_metric" AS (
      SELECT
        warehouse_name,
        COUNT(CASE WHEN execution_status IN ('FAIL', 'INCIDENT') THEN 1 END) AS failed_count,
        COUNT(*)                                                                   AS total_count,
        ROUND(COUNT(CASE WHEN execution_status IN ('FAIL', 'INCIDENT') THEN 1 END)
          / NULLIF(COUNT(*), 0) * 100, 2)                                     AS "METRIC_VALUE"
      FROM "all_queries"
      GROUP BY warehouse_name
    ),
    "history" AS (
      SELECT warehouse_name, start_time, execution_status
      FROM snowflake.account_usage.query_history
      WHERE start_time >= DATEADD('day', -8, CURRENT_DATE())
        AND start_time <  CURRENT_DATE()
        AND query_type IN (
          'INSERT', 'MERGE', 'COPY', 'UPDATE', 'DELETE', 'CREATE_TABLE_AS_SELECT'
        )
    ),
    "Compared Trend 1_time_windows" AS (
      SELECT
        DATEADD('day', -SEQ4() - 1, current_timestamp())          AS "WINDOW_END",
        DATEADD('hours', -24, DATEADD('day', -SEQ4() - 1,
          current_timestamp()))                                      AS "WINDOW_START"
      FROM TABLE(GENERATOR(ROWCOUNT => 7))
    ),
    "Compared Trend 1_metric_per_time_window" AS (
      SELECT
        h.warehouse_name, t."WINDOW_END",
        ROUND(COUNT(CASE WHEN h.execution_status IN ('FAIL', 'INCIDENT') THEN 1 END)
          / NULLIF(COUNT(*), 0) * 100, 2) AS "METRIC_VALUE"
      FROM "history" h
      JOIN "Compared Trend 1_time_windows" t
        ON t."WINDOW_START" <= h.start_time AND t."WINDOW_END" >= h.start_time
      GROUP BY h.warehouse_name, t."WINDOW_END"
    ),
    "Compared Trend 1" AS (
      SELECT
        warehouse_name,
        MAX("METRIC_VALUE")              AS "COMPARED_VALUE",
        MAX("METRIC_VALUE") * (1 + 0.3) AS "COMPARED_VALUE_THRESHOLD"
      FROM "Compared Trend 1_metric_per_time_window"
      GROUP BY warehouse_name
    )
SELECT
    m.warehouse_name, m.failed_count, m.total_count,
    m."METRIC_VALUE", c."COMPARED_VALUE", c."COMPARED_VALUE_THRESHOLD"
FROM "main_metric" m
LEFT JOIN "Compared Trend 1" c ON m.warehouse_name = c.warehouse_name
WHERE m.total_count >= 10
  AND m."METRIC_VALUE" > c."COMPARED_VALUE_THRESHOLD"
ORDER BY m."METRIC_VALUE" DESC

Sample alert

LH
LighthouseToday at 9:41 AMAPP

⚠️ ETL Failure Rate Spike — 23% failure rate in the last hour (7 of 30 ETL queries failed on ETL_WH). 7-day max: 2% · Check recent schema changes.

10 / FAILReal-timeHourly

Failed Task — Last 1h

A Snowflake task failed. Downstream data may be stale.

Why monitor

  • Tasks fail silently by default — no email, no Slack, just stale data downstream until a stakeholder notices.
  • Uses INFORMATION_SCHEMA.task_history for real-time detection — no 45-minute ACCOUNT_USAGE lag, so the alert fires within the hour.
  • Returns ERROR_MESSAGE per task so you know immediately whether it's a dependency failure, a schema change, or a data issue.
SQL · Snowflake
SELECT
    name,
    database_name,
    schema_name,
    state,
    error_code,
    error_message,
    scheduled_time,
    completed_time,
    1 as failures_count
FROM TABLE(information_schema.task_history(
    scheduled_time_range_start => dateadd('hours', -2, current_timestamp()),
    scheduled_time_range_end   => dateadd('hours', -1, current_timestamp())
))
WHERE state = 'FAILED'
ORDER BY scheduled_time DESC

Sample alert

LH
LighthouseToday at 9:41 AMAPP

⚠️ Failed Task — NIGHTLY_TRANSFORM failed at 2:14 AM (scheduled 2:00 AM). Error: SQL compilation error — object 'STAGE_DB.RAW.EVENTS' does not exist.

⚙️

Optimization

2 metrics
11 / OPTAuditWeekly

Stale Tables — 90-Day No Query

Tables with no recorded query access in 90 days. Requires Enterprise edition (access_history).

Why monitor

  • Uses LATERAL FLATTEN on access_history to detect any read access — not just direct queries, but joins and CTEs where the table is a base object.
  • Excludes Snowflake system tables (table_catalog != 'SNOWFLAKE') so only your data shows up in results.
  • Returns one row per table/view with storage_gb and days_since_last_use — sorted by staleness so the longest-ignored objects surface first. Filter by storage_gb to prioritize cost savings.
SQL · Snowflake
WITH last_access AS (
    SELECT
        f.value:"objectId"::NUMBER   AS object_id,
        f.value:"objectName"::STRING AS object_name,
        MAX(query_start_time)         AS last_accessed_at
    FROM SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY,
         LATERAL FLATTEN(input => direct_objects_accessed) f
    WHERE f.value:"objectDomain"::STRING IN ('Table', 'View')
    GROUP BY 1, 2

    UNION ALL

    SELECT
        f.value:"objectId"::NUMBER   AS object_id,
        f.value:"objectName"::STRING AS object_name,
        MAX(query_start_time)         AS last_accessed_at
    FROM SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY,
         LATERAL FLATTEN(input => base_objects_accessed) f
    WHERE f.value:"objectDomain"::STRING IN ('Table', 'View')
    GROUP BY 1, 2
),
last_access_combined AS (
    SELECT
        object_id,
        object_name,
        MAX(last_accessed_at) AS last_accessed_at
    FROM last_access
    GROUP BY 1, 2
)
SELECT
    t.table_catalog || '.' || t.table_schema || '.' || t.table_name AS full_table_name,
    t.table_type,
    t.table_catalog  AS database_name,
    t.table_schema   AS schema_name,
    t.table_name,
    t.row_count,
    ROUND(t.bytes / 1073741824.0, 3) AS storage_gb,
    t.last_altered,
    a.last_accessed_at,
    DATEDIFF('day', COALESCE(a.last_accessed_at, t.last_altered), CURRENT_DATE()) AS days_since_last_use
FROM SNOWFLAKE.ACCOUNT_USAGE.TABLES t
LEFT JOIN last_access_combined a
    ON  a.object_id   = t.table_id
    OR  a.object_name = t.table_catalog || '.' || t.table_schema || '.' || t.table_name
WHERE t.deleted IS NULL
  AND t.table_type IN ('BASE TABLE', 'VIEW')
  AND t.table_catalog NOT IN ('SNOWFLAKE')
  AND DATEDIFF('day', COALESCE(a.last_accessed_at, t.last_altered), CURRENT_DATE()) >= 90
ORDER BY days_since_last_use DESC NULLS LAST

Sample alert

LH
LighthouseToday at 9:41 AMAPP

🔍 Stale Tables Audit — 12 tables with no access in 90+ days. Total: 142 GB. Largest: LEGACY_EVENTS (89 GB) in ANALYTICS_DB.

12 / OPTHistoricalDaily

Warehouse Utilization — Yesterday

% of compute credits spent on actual query execution yesterday. Low = idle waste (warehouse billing while doing nothing). High = capacity risk.

Why monitor

  • Credit-based, not time-based — uses credits_attributed_compute_queries / credits_used_compute from WAREHOUSE_METERING_HISTORY. More reliable than time-weighted load averages.
  • Surfaces idle waste in both directions: idle_credits shows exactly how many credits were billed while the warehouse sat doing nothing — the real cost of a poorly-sized or always-on warehouse.
  • Alerts in both directions: low utilization (e.g. <30%) means the warehouse is mostly idle while billing; high utilization (>90%) means queries may be queueing and you need to scale up.
SQL · Snowflake
SELECT
    warehouse_name,
    ROUND(SUM(credits_used_compute), 3)                                      AS total_compute_credits,
    ROUND(SUM(credits_attributed_compute_queries), 3)                        AS query_credits,
    ROUND(SUM(credits_used_compute - credits_attributed_compute_queries), 3) AS idle_credits,
    ROUND(
        SUM(credits_attributed_compute_queries)
        / NULLIF(SUM(credits_used_compute), 0) * 100, 2
    )                                                                        AS utilization_pct
FROM snowflake.account_usage.warehouse_metering_history
WHERE start_time >= DATEADD('day', -1, CURRENT_DATE())
  AND end_time   <  CURRENT_DATE()
GROUP BY warehouse_name
ORDER BY utilization_pct ASC

Sample alert

LH
LighthouseToday at 9:41 AMAPP

⚠️ Low Warehouse Utilization — TRANSFORM_WH: 12% utilization yesterday. Only 0.3 of 2.5 compute credits went to actual queries · Consider reducing warehouse size or lowering auto-suspend.

Want Slack alerts when any of these fire?

Connect Snowflake to Lighthouse in 10 minutes — free, read-only, no SQL required.

Start free →
These metrics are running in Lighthouse right now

Want alerts when any
of these fire?

Connect your Snowflake account and get all 12 metrics live in under 10 minutes — free, read-only, with smart comparison out of the box.

Already have metrics in mind? Describe them in plain English — Lighthouse writes the SQL.

No credit card required · Read-only access · Cancel anytime