With few lines of code, you can extract data from text files,
including log files, using regular- expression of capture groups. If
you have used regular expressions to search for matching text, extracting
text using the .NET Framework will be very useful. If you have not
worked with regular expressions before, or (like me) you need a
reference to remember all the symbols, check out the Microsoft Developer Network’s reference for help.
Finding Matching Lines
Imagine that you need to parse a log file (we’ll use
C:\Windows\WgaNotify.log as an example, because it’s present on most
computers) and list every file that was successfully copied. The
WgaNotify.log file resembles the following:
[WgaNotify.log]
0.109: ========================================================
0.109: 2006/04/27 06:54:09.218 (local)
0.109: Failed To Enable SE_SHUTDOWN_PRIVILEGE
1.359: Starting AnalyzeComponents
1.359: AnalyzePhaseZero used 0 ticks
1.359: No c:\windows\INF\updtblk.inf file.
23.328: Copied file: C:\WINDOWS\system32\LegitCheckControl.dll
23.578: Copied file (delayed): C:\WINDOWS\system32\SETE.tmp
25.156: Return Code = 0
25.156: Starting process: C:\WINDOWS\system32\wgatray.exe /b
0.109: ========================================================
0.109: 2006/04/27 06:54:09.218 (local)
0.109: Failed To Enable SE_SHUTDOWN_PRIVILEGE
1.359: Starting AnalyzeComponents
1.359: AnalyzePhaseZero used 0 ticks
1.359: No c:\windows\INF\updtblk.inf file.
23.328: Copied file: C:\WINDOWS\system32\LegitCheckControl.dll
23.578: Copied file (delayed): C:\WINDOWS\system32\SETE.tmp
25.156: Return Code = 0
25.156: Starting process: C:\WINDOWS\system32\wgatray.exe /b
As you can see, two of the lines (shown in bold) contain useful
information, and the rest can be ignored. You could use the following
console application, which requires the System.IO and
System.Text.RegularExpressions name spaces, to display just the lines
that contain the phrase “Copied file”:
' Visual Basic
Dim inFile As StreamReader = File.OpenText("C:\Windows\wganotify.log")
Dim inLine As String
Dim inFile As StreamReader = File.OpenText("C:\Windows\wganotify.log")
Dim inLine As String
' Read each line of the log file
While (inLine = inFile.ReadLine()) IsNot Nothing
Dim r As New Regex("Copied file")
' Display the line only if it matches the regular expression
If r.IsMatch(inLine) Then
Console.WriteLine(inLine)
End If
End While
inFile.Close()
While (inLine = inFile.ReadLine()) IsNot Nothing
Dim r As New Regex("Copied file")
' Display the line only if it matches the regular expression
If r.IsMatch(inLine) Then
Console.WriteLine(inLine)
End If
End While
inFile.Close()
// C#
StreamReader inFile = File.OpenText(@"C:\Windows\wganotify.log");
string inLine;
StreamReader inFile = File.OpenText(@"C:\Windows\wganotify.log");
string inLine;
// Read each line of the log file
while ((inLine = inFile.ReadLine()) != null)
{
Regex r = new Regex(@"Copied file");
while ((inLine = inFile.ReadLine()) != null)
{
Regex r = new Regex(@"Copied file");
// Display the line only if it matches the regular expression
if (r.IsMatch(inLine))
Console.WriteLine(inLine);
}
inFile.Close();
if (r.IsMatch(inLine))
Console.WriteLine(inLine);
}
inFile.Close();
Running this console application would match the lines that contain
information about the files copied and display the following:
23.328: Copied file: C:\WINDOWS\system32\LegitCheckControl.dll
23.578: Copied file (delayed): C:\WINDOWS\system32\SETE.tmp
23.578: Copied file (delayed): C:\WINDOWS\system32\SETE.tmp
Capturing Specific Data
To extract portions of matching lines, specify capture groups by
surrounding a portion of your regular expression with parentheses. For
example, the regular expression "Copied file:\s*(.*$)" would place
everything after the phrase “Copied file:”, followed by white space
(the “\s” symbol), into a group. Remember, “.*” matches anything, and
“$” matches the end of the line.
To match a pattern and capture a portion of the match, follow these steps:
- Create a regular expression, and enclose in parentheses the pattern to be matched. This creates a group.
- Create an instance of the System.Text.RegularExpressions.Match class using the static Regex.Match method.
- Retrieve the matched data by accessing the elements of the Match.Groups array. The first group is added to the first element, the second group is added to the second element, and so on.
The following example expands on the previous code sample to extract and display the filenames from the WgaNotify.log file:
' Visual Basic
Dim inFile As StreamReader = File.OpenText("C:\Windows\wganotify.log")
Dim inLine As String
Dim inFile As StreamReader = File.OpenText("C:\Windows\wganotify.log")
Dim inLine As String
' Read each line of the log file
While (inLine = inFile.ReadLine()) IsNot Nothing
' Create a regular expression
Dim r As New Regex("Copied file.*:\s+(.*$)")
' Display the group only if it matches the regular expression
If r.IsMatch(inLine) Then
Dim m As Match = r.Match(inLine)
Console.WriteLine(m.Groups(1))
End If
End While
inFile.Close()
While (inLine = inFile.ReadLine()) IsNot Nothing
' Create a regular expression
Dim r As New Regex("Copied file.*:\s+(.*$)")
' Display the group only if it matches the regular expression
If r.IsMatch(inLine) Then
Dim m As Match = r.Match(inLine)
Console.WriteLine(m.Groups(1))
End If
End While
inFile.Close()
// C#
StreamReader inFile = File.OpenText(@"C:\Windows\wganotify.log");
string inLine;
StreamReader inFile = File.OpenText(@"C:\Windows\wganotify.log");
string inLine;
// Read each line of the log file
while ((inLine = inFile.ReadLine()) != null)
{
// Create a regular expression
Regex r = new Regex(@"Copied file.*:\s+(.*$)");
while ((inLine = inFile.ReadLine()) != null)
{
// Create a regular expression
Regex r = new Regex(@"Copied file.*:\s+(.*$)");
// Display the group only if it matches the regular expression
if (r.IsMatch(inLine))
{
Match m = r.Match(inLine);
Console.WriteLine(m.Groups[1]);
}
}
inFile.Close();
if (r.IsMatch(inLine))
{
Match m = r.Match(inLine);
Console.WriteLine(m.Groups[1]);
}
}
inFile.Close();
This code does a bit better, displaying just the filenames of the copied files:
C:\WINDOWS\system32\LegitCheckControl.dll
C:\WINDOWS\system32\SETE.tmp
C:\WINDOWS\system32\SETE.tmp
Capturing Multiple Groups
You can also separate the folder and filename by matching multiple
groups in a single line. The following slightly updated sample creates
separate capture groups for the folder name and the filename, and then
displays both values. Notice that the regular expression now contains
two groups (indicated by two sets of parentheses), and the call to
Console.WriteLine now references the first two elements in the
Match.Groups array.
' Visual Basic
Dim inFile As StreamReader = File.OpenText("C:\Windows\wganotify.log")
Dim inLine As String
Dim inFile As StreamReader = File.OpenText("C:\Windows\wganotify.log")
Dim inLine As String
' Read each line of the log file
While (inLine = inFile.ReadLine()) IsNot Nothing
' Create a regular expression
Dim r As New Regex("Copied file.*:\s+(.*\\)(.*$)")
' Display the line only if it matches the regular expression
If r.IsMatch(inLine) Then
Dim m As Match = r.Match(inLine)
Console.WriteLine("Folder: " + m.Groups(1) + ", File: " + m.Groups(2))
End If
End While
inFile.Close()
While (inLine = inFile.ReadLine()) IsNot Nothing
' Create a regular expression
Dim r As New Regex("Copied file.*:\s+(.*\\)(.*$)")
' Display the line only if it matches the regular expression
If r.IsMatch(inLine) Then
Dim m As Match = r.Match(inLine)
Console.WriteLine("Folder: " + m.Groups(1) + ", File: " + m.Groups(2))
End If
End While
inFile.Close()
// C#
StreamReader inFile = File.OpenText(@"C:\Windows\wganotify.log");
string inLine;
StreamReader inFile = File.OpenText(@"C:\Windows\wganotify.log");
string inLine;
// Read each line of the log file
while ((inLine = inFile.ReadLine()) != null)
{
// Create a regular expression
Regex r = new Regex(@"Copied file.*:\s+(.*\\)(.*$)");
while ((inLine = inFile.ReadLine()) != null)
{
// Create a regular expression
Regex r = new Regex(@"Copied file.*:\s+(.*\\)(.*$)");
// Display the line only if it matches the regular expression
if (r.IsMatch(inLine))
{
Match m = r.Match(inLine);
Console.WriteLine("Folder: " + m.Groups[1] + ", File: " + m.Groups[2]);
}
}
inFile.Close();
if (r.IsMatch(inLine))
{
Match m = r.Match(inLine);
Console.WriteLine("Folder: " + m.Groups[1] + ", File: " + m.Groups[2]);
}
}
inFile.Close();
The end result is that the console application captures the folder
and filename separately, and outputs the following formatted data:
Folder: C:\WINDOWS\system32\, File: LegitCheckControl.dll
Folder: C:\WINDOWS\system32\, File: SETE.tmp
Folder: C:\WINDOWS\system32\, File: SETE.tmp
Using Named Capture Groups
You can make your regular expressions easier to read by naming the capture groups. To name a group, add “?<name>” after the open parenthesis. You can then access the named groups using Match.Groups[“name”].
The following example demonstrates using named groups with the
Match.Result method, which allows you to format the results of a
regular expression match. It produces exactly the same output as the
previous code sample, but the code is easier to read.
' Visual Basic
Dim inFile As StreamReader = File.OpenText("C:\Windows\wganotify.log")
Dim inLine As String
Dim inFile As StreamReader = File.OpenText("C:\Windows\wganotify.log")
Dim inLine As String
' Read each line of the log file
While (inLine = inFile.ReadLine()) IsNot Nothing
' Create a regular expression
Dim r As New Regex("Copied file.*:\s+(?<folder>.*\\)(?<file>.*$)")
' Display the line only if it matches the regular expression
If r.IsMatch(inLine) Then
Dim m As Match = r.Match(inLine)
Console.WriteLine(m.Result("Folder: ${folder}, File: ${file}"))
End If
End While
inFile.Close()
While (inLine = inFile.ReadLine()) IsNot Nothing
' Create a regular expression
Dim r As New Regex("Copied file.*:\s+(?<folder>.*\\)(?<file>.*$)")
' Display the line only if it matches the regular expression
If r.IsMatch(inLine) Then
Dim m As Match = r.Match(inLine)
Console.WriteLine(m.Result("Folder: ${folder}, File: ${file}"))
End If
End While
inFile.Close()
// C#
StreamReader inFile = File.OpenText(@"C:\Windows\wganotify.log");
string inLine;
StreamReader inFile = File.OpenText(@"C:\Windows\wganotify.log");
string inLine;
// Read each line of the log file
while ((inLine = inFile.ReadLine()) != null)
{
// Create a regular expression
Regex r = new Regex(@"Copied file.*:\s+(?<folder>.*\\)(?<file>.*$)");
while ((inLine = inFile.ReadLine()) != null)
{
// Create a regular expression
Regex r = new Regex(@"Copied file.*:\s+(?<folder>.*\\)(?<file>.*$)");
// Display the line only if it matches the regular expression
if (r.IsMatch(inLine))
{
Match m = r.Match(inLine);
Console.WriteLine(m.Result("Folder: ${folder}, File: ${file}"));
}
}
inFile.Close();
if (r.IsMatch(inLine))
{
Match m = r.Match(inLine);
Console.WriteLine(m.Result("Folder: ${folder}, File: ${file}"));
}
}
inFile.Close();
The .NET Framework supports using capture groups with regular
expressions to extract specific data from log files. Using capture
groups, you can parse complex text files and isolate just the
information you need. First, create a Regex object (part of the
System.Text.RegularExpressions namespace) using a regular expression
that includes one or more capture groups in parentheses. Then, call the
Regex.Match method to compare the regular expression to the input
string. Access your capture groups using the Match.Groups array, or
format and output the capture groups by calling Match.Result.
This is a good logger:
ReplyDeletehttp://www.kellermansoftware.com/p-14-net-logging-library.aspx