What type of information is stored in a log file?

Whether you’re sitting at a desktop PC, reading the news on a tablet, or operating a website on a server, there are many different processes taking place in the background of these devices. Should an error occur, or should you simply just want to find out more about which actions a given operating system or program is executing, then log files can help you on this front. These are automatically recorded by virtually every application, server, and database system.

Generally, log files are rarely read and evaluated — think of them as a virtual black box of sorts: only in the most urgent of cases are they inspected. Due to the manner in which they capture data, log files prove to be an excellent source for finding out more about program and system errors; they also lend themselves particularly well to gathering information on user behaviour. The ability to find out more about users makes this technology especially interesting for website operators, as they are able to gain useful data from the log files located on their web servers.

What is a log file?

Log files, which are sometimes referred to as event files, generally deal with common text files. These contain information on all processes that have been defined as being relevant by their corresponding programmers. When it comes to a database’s log file, this shows all the changes made to correctly executed transactions. If part of a database is deleted, e.g. in the course of a system shutdown, log files act as a basis for recovering the data set to its proper state.

Log files are automatically generated according to how they’ve been programmed. It’s also possible to create your own files, provided you’re familiar enough with the technical aspects involved. Generally, a line within a log file contains the following information:

Recorded events (e.g. program start)
Time stamp, which assigns a date and time to the event

Normally, the time is put on first in order to display the chronological sequence of events.

Typical application for log files

Operating systems generally create multiple protocol files by assigning the different process types to fixed categories. For example, Windows systems record information on application events, system events, security-related events, set-up events, and redirect events. This allows administrators to get an insight into corresponding log file information, which can assist them in their troubleshooting; Windows log files also display which users have logged on and off the system. In addition to the operating system, the following programs and systems collect completely different data:

Background programs, like e-mails, databases, or proxy servers generate log files that are primarily used to record error and event messages as well as other notices. These functions help secure, and in the event of a crash, restore data.
Installed software, like official programs, games, instant messengers, firewalls, or virus scanners, save many different types of data in log files. Different configurations or chat messages may be involved in this process. Instances of program crashes are compiled and used to help speed up troubleshooting efforts.
Servers (especially web servers) record relevant network activity; this information contains useful data on users and their behavior within networks. What’s more, authorised administrators are granted information on which users started an application or requested a file, what time and for how long they did this, and which operating system was used. Web log analysis is one of the oldest web controlling methods and one of the best examples for showcasing the many uses of log files.

Web server log files: the textbook example for the potential of log files

Originally, log files of web servers, like Apache or Microsoft IIS, were the default options for recording and repairing processing errors. It was quickly discovered, however, that web server log files contain much more valuable data: information on the usability and popularity of websites hosted on servers as well as user data such as:

Time of page view
Number of page views
Session duration
IP address and user’s host name
Information on the requesting client (usually the browser)
Search engine used, including search queries
Applied operating system

A typical entry of a web server log file looks as follows:

183.121.143.32 - - [18/Mar/2003:08:04:22 +0200] "GET /images/logo.jpg HTTP/1.1" 200 512 "http://www.wikipedia.org/" "Mozilla/5.0 (X11; U; Linux i686; de-DE;rv:1.7.5)"

Detailed overview of individual parameters:

Significance	Example value	Explanation
IP address	183.121.143.32	The requesting host’s IP address
Idle	-	Generally unknown RFC 1413 identity
Who?	-	Reveals user name, provided the HTTP authentication has taken place; otherwise, as is the case in this example, it remains empty
When?	[18/Mar/2003:08:04:22 +0200]	Time stamp consisting of date, time, and time offset information
What?	GET /images/logo.jpg HTTP/1.1	The occurred event, in this case an image request via HTTP
Ok	200	Confirms successful request (HTTP status code 200)
How much?	512	If applicable: the amount of transferred data in bytes
From where?	http://www.wikipedia.org/	The web address from which the files are requested
By which means?	Mozilla/5.0 (X11; U; Linux i686; de-DE;rv:1.7.5)	Technical information about the client: browser, operating system, kernel, user interface, voice output, version

In order to effectively evaluate the flood of information, tools, like Webalizer have been developed. These take collected data and transform it into informative statistics, tables, and graphics. Tendencies regarding a website’s growth, the user friendliness of individual pages, or relevant keywords and themes can all be determined using this information.

Even if web server log file analyses continue to be carried out, this tried and true method has lost some of its former sheen due to increasingly popular methods of web analysis, like Cookies or page tagging. Some things pushing this trend include the error-prone nature of log file analysis when assigning sessions as well as the fact that website operators often aren’t able to access a web server’s log files. Despite this drawback, all error reports are immediately registered. Moreover, data collected from a log file analysis is kept directly within the company.