Untitled

 avatar
unknown
plain_text
a month ago
18 kB
10
Indexable
<#
.SYNOPSIS
    Detect-ThermalThrottling.ps1
    Intune Proactive Remediation - Detection Script
    Identifies thermal throttling patterns on Lenovo X1 Yoga Gen 8 (21HR) fleet.

.DESCRIPTION
    Samples thermal zone temperature, CPU frequency, and iGPU utilisation over a 
    burst window. Correlates high temperatures with frequency reduction below base 
    clock to confirm thermal throttling vs normal power management. Parses Event 
    ID 37 logs to extract the total duration of firmware-enforced throttling.
    
    Includes a -ConsoleMode switch for human-readable output during live testing.

.NOTES
    Author:  Jared Anderson
    Version: 2.6 (Stripped ACPI conversion artifacts for true integer temps)
    Date:    2026-05-20
    Target:  Lenovo ThinkPad X1 Yoga Gen 8 (Type 21HR)
    Deploy:  Intune Proactive Remediation (Detection)
    Context: SYSTEM (64-bit)

===========================================================================
VENDOR DISCUSSION GUIDE (TALKING POINTS FOR LENOVO ENGINEERING)
===========================================================================
If reviewing this script with vendor hardware support, reference these 
industry-standard definitions and telemetry points:

1. THE "DTT" TRAP & DRIVER VERSIONS
   Intel Dynamic Tuning Technology (DTT) acts as the leash between Intel's 
   hardware and Lenovo's chassis design. When machines get hot, DTT chokes 
   the power limits (PL1/PL2) to keep the aluminum bottom deck from burning 
   users (often tied to the gyroscope for "Cool Quiet on Lap").
   > TRAP: Support will blame outdated drivers.
   > COUNTER: This script logs the exact DTT driver version. If a healthy 
     machine and a lagging machine run the exact same DTT version and BIOS, 
     it is a physical hardware cooling failure, not a software bug.

2. EVENT ID 37 (FIRMWARE INTERVENTION)
   When DTT intervenes, it logs Event ID 37 ("The speed of processor X... 
   is being limited by system firmware"). 
   > POINT: This proves Windows is NOT lagging due to high CPU usage or 
     bloatware. It proves Lenovo's firmware explicitly commanded the CPU to 
     choke. The script uses RegEx to sum the exact number of SECONDS the 
     machine spent in this choked state over the last 24 hours.

3. TJMAX AND THERMAL PASTE "PUMP-OUT"
   Intel's hardware T-Junction Max (TJMax) for 13th Gen U-series chips is 100°C. 
   > POINT: If this script detects a machine hitting 90°C+ for 20+ seconds 
     ($ThresholdCriticalSamples), the thermal paste has likely suffered 
     from "pump-out" (degraded/pushed off the die) or the heatsink pressure 
     is inadequate. The cooling assembly is failing to dissipate sustained heat.

4. THE "SAWTOOTH" EFFECT (THERMAL CYCLING)
   Degraded thermal paste causes severe micro-stutters because the CPU violently 
   oscillates. It spikes to 95°C+, panics, throttles down to 800MHz (cooling it 
   down), and then turbos back up again. 
   > POINT: Our THERMAL_CYCLING logic is mathematically bulletproof. It does 
     NOT flag normal turbo boosts. It only flags if ALL FOUR of these happen 
     within our 2.5-minute window:
       a) The CPU attempts to Turbo Boost (PeakFreq >= 100%)
       b) The CPU is violently crushed below base clock (MinFreq < 95%)
       c) The temperature swings wildly (TempDelta >= 20°C)
       d) The laptop actually reaches throttling heat levels (PeakTemp >= 80°C)

5. ACPI CACHING ON IDLE
   Windows is "lazy" and does not poll temperature sensors heavily on idle to save 
   power. If CPU utilization is low, the temperature often appears flatlined/stuck.
   > POINT: The script automatically tags "(low util)" next to temperatures in 
     the raw sample data if the CPU wasn't working hard enough to force an ACPI 
     sensor refresh, explaining the flatlines.
===========================================================================
#>

[CmdletBinding()]
param (
    [switch]$ConsoleMode
)

# ============================================================
# CONFIG - THRESHOLDS & TIMING 
# ============================================================

$SampleCount       = 15          # 15 samples @ 10s = 2.5 min window
$SampleIntervalSec = 10          
$TempWarnC         = 80          # Lenovo DTT soft-throttle engagement point
$TempCritC         = 90          # Critical threshold (indicates failing heatsink/paste)
$TempDeltaC        = 20          # Minimum temp swing to qualify for a Sawtooth cycle
$FreqFloorPct      = 95          # Sub-base clock threshold (allows 5% for measurement jitter)
$IdleCpuUtilThreshold = 20       # Avg CPU % below this flags the test/samples as "Idle"
$ThresholdThrottleSamples = 3    # Requires at least 30s of correlated heat + low frequency          
$ThresholdCriticalSamples = 2    # Requires at least 20s of sustained 90C+ temps (filters quick turbo spikes)


# ============================================================
# FUNCTIONS
# ============================================================

function Get-ThermalZoneTemperature {
    $acpi = Get-CimInstance -Namespace "root/WMI" -ClassName "MSAcpi_ThermalZoneTemperature" -ErrorAction SilentlyContinue
    if ($acpi) {
        $maxTenthsK = ($acpi | Measure-Object -Property CurrentTemperature -Maximum).Maximum
        $celsius = [int][math]::Round(($maxTenthsK / 10) - 273.15, 0)
        return @{ TempC = $celsius; Source = "ACPI" }
    }

    $perf = Get-CimInstance -ClassName "Win32_PerfFormattedData_Counters_ThermalZoneInformation" -ErrorAction SilentlyContinue
    if ($perf) {
        $maxK = ($perf | Measure-Object -Property Temperature -Maximum).Maximum
        $celsius = [int][math]::Round($maxK - 273.15, 0)
        return @{ TempC = $celsius; Source = "PerfCounter" }
    }

    return @{ TempC = -1; Source = "Unavailable" }
}

function Get-PerformanceMetrics {
    $counters = @(
        "\Processor Information(_Total)\% Processor Performance",
        "\Processor Information(_Total)\% Processor Utility",
        "\GPU Engine(*)\Utilization Percentage"
    )

    $freqPct = -1
    $cpuUtil = -1
    $gpuUtil = 0
    $freqSrc = "None"

    $data = Get-Counter -Counter $counters -ErrorAction SilentlyContinue
    if ($data) {
        foreach ($sample in $data.CounterSamples) {
            switch -Wildcard ($sample.Path) {
                "*% processor performance*" { $freqPct = [int][math]::Round($sample.CookedValue) }
                "*% processor utility*"     { $cpuUtil = [int][math]::Round($sample.CookedValue) }
                "*gpu engine*"              { 
                    if ($sample.CookedValue -gt $gpuUtil) { 
                        $gpuUtil = [int][math]::Round($sample.CookedValue) 
                    }
                }
            }
        }
        $freqSrc = "Get-Counter"
    }

    $baseMHz = -1
    $procInfo = Get-CimInstance -ClassName "Win32_Processor" -ErrorAction SilentlyContinue | Select-Object -First 1
    if ($procInfo.MaxClockSpeed) { $baseMHz = [int]$procInfo.MaxClockSpeed }

    $actualMHz = if ($freqPct -gt 0 -and $baseMHz -gt 0) { [int][math]::Round(($freqPct / 100) * $baseMHz) } else { -1 }

    return @{
        FreqPct   = $freqPct
        FreqSrc   = $freqSrc
        CpuUtil   = $cpuUtil
        GpuUtil   = $gpuUtil
        ActualMHz = $actualMHz
        BaseMHz   = $baseMHz
    }
}

function Get-DeviceContext {
    $cs = Get-CimInstance -ClassName Win32_ComputerSystem -ErrorAction SilentlyContinue
    $bios = Get-CimInstance -ClassName Win32_BIOS -ErrorAction SilentlyContinue
    $os = Get-CimInstance -ClassName Win32_OperatingSystem -ErrorAction SilentlyContinue

    $powerSrc = "Unknown"
    $battery = Get-CimInstance -ClassName Win32_Battery -ErrorAction SilentlyContinue | Select-Object -First 1
    if ($battery) {
        $powerSrc = switch ([int]$battery.BatteryStatus) {
            1 { "Battery" }
            2 { "AC" }
            3 { "AC-Full" }
            4 { "Battery-Low" }
            5 { "Battery-Critical" }
            default { "Unknown ($($battery.BatteryStatus))" }
        }
    }

    return @{
        Model      = $cs.Model
        Serial     = $bios.SerialNumber
        BIOSVer    = $bios.SMBIOSBIOSVersion
        Hostname   = $env:COMPUTERNAME
        OSBuild    = $os.BuildNumber
        UptimeHrs  = [math]::Round(((Get-Date) - $os.LastBootUpTime).TotalHours, 1)
        PowerSrc   = $powerSrc
    }
}

function Get-LenovoThermalConfig {
    $vantageStatus = "NotInstalled"
    $vantagePackages = @("E046963F.LenovoSettingsforEnterprise", "E046963F.LenovoCompanion", "E046963F.LenovoSettings")
    foreach ($pkg in $vantagePackages) {
        $found = Get-AppxPackage -Name $pkg -AllUsers -ErrorAction SilentlyContinue | Select-Object -First 1
        if ($found) {
            $vantageStatus = "$($found.Name) v$($found.Version)"
            break
        }
    }

    $dttDriver = "NotInstalled"
    $dtt = Get-CimInstance -ClassName Win32_PnPSignedDriver -Filter "DeviceName like '%Dynamic Tuning%'" -ErrorAction SilentlyContinue | Select-Object -First 1
    if ($dtt) {
        $dttDriver = "Intel DTT v$($dtt.DriverVersion)"
    }

    $thermalAC      = "N/A"
    $thermalBattery = "N/A"
    $biosSettings = Get-CimInstance -Namespace "root/WMI" -ClassName "Lenovo_BiosSetting" -ErrorAction SilentlyContinue
    if ($biosSettings) {
        $acSetting = $biosSettings | Where-Object { $_.CurrentSetting -like "AdaptiveThermalManagementAC,*" } | Select-Object -First 1
        if ($acSetting) { $thermalAC = ($acSetting.CurrentSetting -split ",")[1] }
        
        $batSetting = $biosSettings | Where-Object { $_.CurrentSetting -like "AdaptiveThermalManagementBattery,*" } | Select-Object -First 1
        if ($batSetting) { $thermalBattery = ($batSetting.CurrentSetting -split ",")[1] }
    }

    return @{
        Vantage        = $vantageStatus
        DTTDriver      = $dttDriver
        ThermalModeAC  = $thermalAC
        ThermalModeBat = $thermalBattery
    }
}

function Get-SystemFirmwareThrottleEvents {
    $startTime = (Get-Date).AddDays(-1)
    $events = Get-WinEvent -FilterHashtable @{LogName='System'; ProviderName='Microsoft-Windows-Kernel-Processor-Power'; Id=37; StartTime=$startTime} -ErrorAction SilentlyContinue
    
    $totalThrottleSeconds = 0
    
    if ($events) { 
        foreach ($event in $events) {
            # Extract the precise duration from the Windows event message log using RegEx
            if ($event.Message -match "for (\d+) seconds") {
                $totalThrottleSeconds += [int]$matches[1]
            }
        }
        return @{ Count = $events.Count; TotalSeconds = $totalThrottleSeconds }
    }
    
    return @{ Count = 0; TotalSeconds = 0 }
}

# ============================================================
# EXECUTION
# ============================================================

$timestamp = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
$device    = Get-DeviceContext
$lenovo    = Get-LenovoThermalConfig

$fwEventData  = Get-SystemFirmwareThrottleEvents
$fwEventCount = $fwEventData.Count
$fwEventSecs  = $fwEventData.TotalSeconds

$samples   = [System.Collections.Generic.List[object]]::new()

# Console mode UI prep
if ($ConsoleMode) {
    Clear-Host
    Write-Host "======================================================" -ForegroundColor Cyan
    Write-Host " LENOVO X1 YOGA (GEN 8) THERMAL ANALYSIS " -ForegroundColor White
    Write-Host "======================================================" -ForegroundColor Cyan
    Write-Host "Sampling active... This will take $([math]::Round(($SampleCount * $SampleIntervalSec)/60, 1)) minutes." -ForegroundColor Yellow
    Write-Host ""
}

# Sampling loop
for ($i = 1; $i -le $SampleCount; $i++) {
    if ($ConsoleMode) { Write-Host "Collecting sample $i of $SampleCount..." -NoNewline; Write-Host "`r" -NoNewline }
    
    $thermal = Get-ThermalZoneTemperature
    $perf    = Get-PerformanceMetrics

    $samples.Add([PSCustomObject]@{
        Sample    = $i
        TempC     = $thermal.TempC
        FreqPct   = $perf.FreqPct
        CpuUtil   = $perf.CpuUtil
        GpuUtil   = $perf.GpuUtil
        ActualMHz = $perf.ActualMHz
        BaseMHz   = $perf.BaseMHz
        Source    = $thermal.Source
    })

    if ($i -lt $SampleCount) { Start-Sleep -Seconds $SampleIntervalSec }
}

if ($ConsoleMode) { Write-Host "Sampling complete!                            `n" -ForegroundColor Green }

# Analysis
$peakTemp       = ($samples | Measure-Object -Property TempC -Maximum).Maximum
$minTemp        = ($samples | Measure-Object -Property TempC -Minimum).Minimum
$avgTemp        = [math]::Round(($samples | Measure-Object -Property TempC -Average).Average, 1)
$tempDelta      = [int]($peakTemp - $minTemp)
$peakFreqPct    = ($samples | Measure-Object -Property FreqPct -Maximum).Maximum
$minFreqPct     = ($samples | Measure-Object -Property FreqPct -Minimum).Minimum
$avgFreqPct     = [math]::Round(($samples | Measure-Object -Property FreqPct -Average).Average, 1)
$avgCpuUtil     = [math]::Round(($samples | Measure-Object -Property CpuUtil -Average).Average, 1)
$peakGpuUtil    = ($samples | Measure-Object -Property GpuUtil -Maximum).Maximum
$thermalSource  = $samples[0].Source

# Detection paths
$throttledSamples = $samples | Where-Object { $_.TempC -ge $TempWarnC -and $_.FreqPct -lt $FreqFloorPct -and $_.FreqPct -gt 0 }
$throttledCount   = ($throttledSamples | Measure-Object).Count

$criticalSamples = $samples | Where-Object { $_.TempC -ge $TempCritC }
$criticalCount   = ($criticalSamples | Measure-Object).Count

# UPDATED LOGIC: True "Sawtooth" Thermal Cycling (4-Pillar Validation)
$isCycling = ($tempDelta -ge $TempDeltaC) -and 
             ($peakTemp -ge $TempWarnC) -and 
             ($peakFreqPct -ge 100) -and 
             ($minFreqPct -lt $FreqFloorPct)

# Determine verdict and reason
$flags = [System.Collections.Generic.List[string]]::new()
if ($throttledCount -ge $ThresholdThrottleSamples) { $flags.Add("FREQ_THROTTLED") }
if ($criticalCount -ge $ThresholdCriticalSamples)  { $flags.Add("CRITICAL_TEMP") }
if ($isCycling)                                    { $flags.Add("THERMAL_CYCLING") }
if ($fwEventCount -gt 0)                           { $flags.Add("FIRMWARE_INTERVENTION(Evt37)") }

$isThrottling = $flags.Count -gt 0

# Apply Low-Util tag if the machine passed while mostly asleep
if ($isThrottling) { 
    $verdict = $flags -join "+" 
} else { 
    if ($avgCpuUtil -lt $IdleCpuUtilThreshold) {
        $verdict = "OK - Low CPU Util"
    } else {
        $verdict = "OK - Load Tested"
    }
}

# ============================================================
# OUTPUT ROUTING
# ============================================================

if ($ConsoleMode) {
    Write-Host "--- DEVICE CONTEXT ---" -ForegroundColor Cyan
    Write-Host "Hostname:       $($device.Hostname)"
    Write-Host "Model:          $($device.Model)"
    Write-Host "BIOS Version:   $($device.BIOSVer)"
    Write-Host "Power State:    $($device.PowerSrc)"
    Write-Host "DTT Driver:     $($lenovo.DTTDriver)"
    Write-Host "BIOS ThermalAC: $($lenovo.ThermalModeAC)"
    Write-Host ""
    
    Write-Host "--- THERMAL TELEMETRY ---" -ForegroundColor Cyan
    Write-Host "Peak Temp:      $peakTemp °C"
    Write-Host "Temp Delta:     $tempDelta °C"
    Write-Host "Min/Peak Freq:  $minFreqPct % / $peakFreqPct %"
    Write-Host "Avg CPU Util:   $avgCpuUtil %"
    Write-Host "Peak GPU Util:  $peakGpuUtil %"
    
    $fwMinutes = [math]::Round($fwEventSecs / 60, 1)
    Write-Host "Event ID 37s:   $fwEventCount events (Total: $fwMinutes minutes throttled)"
    Write-Host ""
    
    Write-Host "--- VERDICT ---" -ForegroundColor Cyan
    if ($isThrottling) {
        Write-Host "Status:         OVERHEAT ($verdict)" -ForegroundColor Red
    } else {
        Write-Host "Status:         PASS ($verdict)" -ForegroundColor Green
    }
    Write-Host ""
    
    Write-Host "--- RAW SAMPLE DATA ---" -ForegroundColor Cyan
    # Create a custom display table that adds "(low util)" to the temperature string if CPU is asleep
    $displaySamples = $samples | Select-Object Sample, 
        @{Name="TempC"; Expression={ if ($_.CpuUtil -lt $IdleCpuUtilThreshold) { "$($_.TempC) (low util)" } else { $_.TempC } }},
        FreqPct, CpuUtil, GpuUtil, ActualMHz, BaseMHz, Source
    
    $displaySamples | Format-Table -AutoSize | Out-String | Write-Host
} 
else {
    # Single-line output for Intune export, tagging cached temps dynamically
    $sampleDetail = ($samples | ForEach-Object {
        $tempDisplay = if ($_.CpuUtil -lt $IdleCpuUtilThreshold) { "$($_.TempC)C(low util)" } else { "$($_.TempC)C" }
        "S$($_.Sample): $tempDisplay | Freq:$($_.FreqPct)% | CPU:$($_.CpuUtil)% | GPU:$($_.GpuUtil)%"
    }) -join " ; "

    $output = @(
        "Timestamp=$timestamp"
        "Host=$($device.Hostname)"
        "Model=$($device.Model)"
        "Serial=$($device.Serial)"
        "BIOSVer=$($device.BIOSVer)"
        "OSBuild=$($device.OSBuild)"
        "PowerSrc=$($device.PowerSrc)"
        "DTTDriver=$($lenovo.DTTDriver)"
        "ThermalModeAC=$($lenovo.ThermalModeAC)"
        "ThermalModeBat=$($lenovo.ThermalModeBat)"
        "Event37_24Hr=$fwEventCount"
        "Evt37_TotalSecs=$fwEventSecs"
        "PeakTempC=$peakTemp"
        "TempDeltaC=$tempDelta"
        "MinFreqPct=$minFreqPct"
        "PeakFreqPct=$peakFreqPct"
        "AvgCpuUtil=$avgCpuUtil"
        "PeakGpuUtil=$peakGpuUtil"
        "ThrottledSamples=$throttledCount/$SampleCount"
        "CriticalSamples=$criticalCount/$SampleCount"
        "Verdict=$verdict"
        "Samples=[$sampleDetail]"
    ) -join " | "

    Write-Output $output
}

if ($isThrottling) { exit 1 }
exit 0
Editor is loading...
Leave a Comment