How log file analysis works with GoAccess

Your web server’s log files tell you almost everything you need to know about the background and behaviour of your visitors. By inspecting the log file, you can find out which browsers your visitors use, how long they stay on your site, how many pages they access, and which search engine or links brought them to your site. All this information makes the log file a first-class source for verifying user-friendliness and optimising your web project. Since it’s impossible to evaluate these extensive text files manually, there are various log file analysis tools (log file analysers), which perform this task and display the results visually. An interesting representative of these analysers is the open source tool, GoAccess.

The basics of GoAccess

The developer, Gerardo Orellana, published the first version of the log file analysis tool, GoAccess, in July 2010. Even today, he still manages it and continues to develop it on the GitHub platform. GoAccess can be used as free software and adapted to fit your own ideas. It was initially under the GNU license, but since 2016 it’s been under the MIT license.

The basic idea of GoAccess is for it to analyse and visually present web statistics in real-time. To be able to do this, the log file analyser evaluates the various log file formats of web servers and cloud services such as Apache, nginx, Amazon S3, and CloudFront, and presents the results in graph form on the dashboard. These can be accessed in Unix-like systems either via browser or via the command line. Alternatively, the statistics can be issued in HTML, JSON, or CSV format.

GoAccess has minimal system requirements and, since it is written in the C programming language, only requires the C program library, ncurses, as a pre-requisite. To use the log file analysis tool on a Windows operating system, you need the Cygwin toolkit, which you can use to run certain Linux application on Microsoft systems.

The characteristic features of the open source tool

You don’t need to configure anything in order to use GoAccess. You simply select the log file that you want to analyse, start the scan, and the information will then be conveniently displayed in real-time. The various data is listed in individual categories, where values for individual measurement periods as well as values for the entire review period are shown. These listings are sorted chronologically by default, but you can also organise the data according to the number of page views or visitors, the bandwidth consumed, or the time needed for the site to load (total, average, or maximum). Some values can also be displayed in bar charts or curve diagrams. In addition to the up-to-date information, GoAccess provides you with a summary of all previously evaluated log data under 'Overall Analyzed Requests'.

Both the terminal and the browser dashboard show the aforementioned categories and diagrams in an appealing and user-friendly manner so that you can quickly draw conclusions about visitors and your website. The following table reveals the different areas covered by the log file analyser and summarises the findings that can be extracted from the values.

Category Decisive values Significance for web analysis
Unique visitors per day – including spiders Views, visitors, dates (data) A unique visitor is understood to mean all views occuring from the same IP address. By watching the number of visitors over a longer period of time, you can see if campaigns or new content are successful.
Requested Files (URLs) Views, bandwidth, loading time (Avg., Cum., Max. T.S.), URL (data) This category provides an overview of the most requested URLs. Here you can find out which pages of your web project are particularly popular, how much bandwidth is consumed, and how stable the loading time of each page is.
Static Requests Views, bandwidth, loading time, file (data) As in the previous case, this is also about files, but only static content like images, icons, or layout elements.
Not Found URLs (404s) Views, URL (data) The URLs listed in this category have led visitors to a 404 error. You can use this statistic to identify and resolve network problems or incorrect links. The latter has a negative effect on users as well as search engines.
Visitor Hostnames and IPs City, country, hostname, IP (data) In this section, you will find information on the provider and IP address of your visitors. GoAccess even provides data on country of origin and location. The findings enable personalised content to be presented to users.
Operating Systems Views, visitors, operating system (data) Here you can see which operating systems your users use (sorted by frequency). You can use this data, for example, to determine exactly how high the mobile traffic volume is.
Browsers Views, visitors, browser (data) In this section, the accessing client types are presented. First and foremost, you’ll see the figures of the different browsers, but also whether crawlers are browsing your site and if so, which ones.
Time Distribution Views, visitors, loading time, hour (data) You obtain an hourly overview of visitor numbers. This way, you can determine exactly when your users are particularly active, and then advertise or publish advertisements or new content.
Virtual Hosts Views, bandwidth, host (data) If you are running multiple virtual hosts (domains, IP addresses) on your web server, you can use these statistics to filter out the resources that are putting the most strain on your server.
Referrers URLs Views, URL (data) The referrer is the information that appears in the log file showing which URL visitors used to access your site. You can use this information to filter out strong partner sites and also find out which search terms visitors use, if they came directly from a search engine.
Referring Sites Views, website address (data) Unlike with the previous statistics, you will not receive the URL, but the general website address of the originating site
Keyphrases from Google’s search engine Views, search terms (data) GoAccess offers a separate list of search queries to the referrer statistics – at least for Google. This saves you the tedious work of evaluating referrer URLs independently. The results obtained can provide useful insight for your keyword strategy.
Geo Location Visitors, origin (data) Under the heading 'Geo Location', you will find where IP addresses are geographically positioned.
HTTP Status Codes Views, status code (data) This section provides an overview of your server’s responses. You can see from the data whether your web server is working properly and whether all content can be accessed without errors.

How to install and use GoAccess

To make sure you install the latest version of GoAccess, download the installation file from the official website. Using the command line, the download and installation works as follows:

$ wget http://tar.goaccess.io/goaccess-1.0.tar.gz
$ tar -xzvf goaccess-1.0.tar.gz
$ cd goaccess-1.0/
$ ./configure --enable-utf8
$ make
# make install

Do not forget that ncurses is a pre-requisite for nginx and Apache log analyser so the latest version should, therefore, be installed on your system. If you haven’t already done so, you can set up the C library using the following code:

$ wget http://ftp.gnu.org/pub/gnu/ncurses/ncurses-5.7.tar.gz
$ tar xzf ncurses-6.0.tar.gz
$ cd ncurses-6.0
$ ./configure --prefix=/opt/ncurses
$ make
# make install
$ ls -la /opt/ncurses

A detailed overview of the configuration options for the log file analysis tool can be found in this manual.