Convert your web logs into time-series

Published on by Joannes Vermorel.

I have already illustrated how to use Microsoft PowerShell to import data from Swivel.com into Lokad. Let's see how to convert W3C web server log files into time-series and then import this time-series into Lokad. Note that the W3C log format is the default for IIS 2003 (Apache servers also supports the W3C log format, but the default seems to be the common log format).

Disclaimer: I am exactly sure what would be the purpose of time-series forecasting when applied to hit volume for web servers. If you had a pool of web servers, web traffic forecasting could be used to increase system responsiveness in a proactive manner. Yet, this is likely to be quite an uncommon needs except for very high traffic websites. I have been toying with web traffic forecasting mostly for internal benchmarking purposes.

In order to convert the logs into a time-series, the following PowerShell line can be used

 ls *.log | get-hits | out-file 'my-logs.txt'

where ls *.log enumerates all the log files that are typically contained within a single directory. The Get-Hits is a custom function (given below) that parse the log file. The Get-Hits function aggregates the web server hits into quarter-hour periods. Obviously, most of the work here is done by the Get-Hits function:

# Get-Hits
# By Joannes Vermorel, 2007
# Convert hits of W3C server logs to time-series (aggregated by quarter-hour)
# Usage: ls *.log | get-hits | out-file 'my-logs.txt'

function Get-Hits
{

param( )
begin { }

process
{
$tab = [System.Char]::ConvertFromUtf32(9)
$file = [System.IO.File]::OpenText( $_.fullname )
$previousDate = [System.DateTime]::MinValue
$hitCount = 0

while($line = $file.ReadLine())
{
if(!$line.StartsWith('#') -and ($line.Length -gt 19))
{
$d = [System.DateTime]::Parse( $line.Substring(0, 19) )
$minutes = [int] ([System.Math]::Floor(($d.Minute / 15)) * 15)
$d = new-object System.DateTime $d.Year, $d.Month, $d.Day, $d.Hour, $minutes, 0

$hitCount += 1

if( !($d -eq $previousDate) )
{
write-output ($d.ToString("yyyy-MM-dd HH:mm:ss") + $tab + $hitCount)
$previousDate = $d
$hitCount = 0
}
}
}

if($hitCount -gt 0)
{
write-output ($d.ToString("yyyy-MM-dd HH:mm:ss") + $tab + $hitCount)
}
}
end { }
}

The output file contains a list of time-values in TSV format (i.e. "Time TAB Value"). Then you can import this TSV data through a direct cut-and-paste in your Lokad account (go Analytics => My Data => Insert a new series). Note that if you have several months of web server logs, accounting for several hundreds of MBs, then the process can takes a few minutes to complete.

Categories: powershell, technical, time series, tips Tags: