Session Files

Creation date: 4/3/2023 11:43 PM    Updated: 7/11/2023 12:19 PM
Introduced in Angelfish v2.50, the Session File is used to identify traffic from bots, crawlers, and referrer spam so it can be removed from the reports.

Based on our tests, the Session File is the most effective way to de-clutter SID & IPUA Profiles in Angelfish.



OVERVIEW

When you use one of the log-based Tracking Methods in Angelfish, your data can be artificially inflated by traffic from bots & crawlers.

There are a few ways to remove this traffic during processing:
  • Enable Ignore Inflated Visits
  • Add Filters to exclude unwanted traffic
  • Use a Session File

The Session File is a file with a File Type that is commonly ignored by crawlers: jpg, gif, tiff, etc.

Session File logic is applied during processing.

When Angelfish processes data, it identifies Visits that contain requests for the Session File...and isolates them from Visits that don't contain requests for the Session File.

You can choose to include or exclude Visits that contain the Session File.

To see Visits that are *likely* real people, we recommend using the Include option.  The Exclude option shows you the extent of bot & crawler activity on your website.

Please note: if you use the Include option, Angelfish ignores all Visits that don't contain a request for the Session File.  

When processing historical data, use an existing file on your website as the Session File.




IMPLEMENTATION


Most crawlers ignore css / js / image / font files, so we recommend using one of them as your Session File.

You can use an existing file on your website as the Session File, or you can add a file to your website.

If you add a file, here are some recommendations:

  • the file should be small 
  • file idea: make a copy of agf.gif and rename it
  • reference the file in the header or footer
  • add a non-constant string to the end of the file so it isn't cached

Here's an example of a Session File that uses PHP to append the current epoch timestamp as a query parameter:

<img src="/filename.gif?t=<?php echo time(); ?>" />

If the Profile's config settings, you would enter "filename.gif" (no quotes) in the Session File field and choose the Include option.



MORE INFO


Here are some hits on our main website from a suspected crawler (with a fake user agent):


163.123.143.50 analytics.angelfishstats.com - [25/Mar/2023:07:15:21 -0600] "HEAD / HTTP/1.1" 200 - "-" "python-requests/2.25.1" "-"
163.123.143.50 analytics.angelfishstats.com - [25/Mar/2023:07:15:23 -0600] "GET / HTTP/1.1" 200 4762 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" "-"
163.123.143.50 analytics.angelfishstats.com - [25/Mar/2023:07:15:26 -0600] "GET / HTTP/1.1" 200 4762 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" "-"
163.123.143.50 analytics.angelfishstats.com - [25/Mar/2023:07:15:28 -0600] "GET /overview/ HTTP/1.1" 200 4706 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" "-"
163.123.143.50 analytics.angelfishstats.com - [25/Mar/2023:07:15:30 -0600] "GET /demo/ HTTP/1.1" 200 4271 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" "-"
163.123.143.50 analytics.angelfishstats.com - [25/Mar/2023:07:15:33 -0600] "GET /features/ HTTP/1.1" 200 5498 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" "-"
163.123.143.50 analytics.angelfishstats.com - [25/Mar/2023:07:15:35 -0600] "GET /pricing/ HTTP/1.1" 200 5846 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" "-"
163.123.143.50 analytics.angelfishstats.com - [25/Mar/2023:07:15:38 -0600] "GET /trial/ HTTP/1.1" 200 4857 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" "-"
163.123.143.50 analytics.angelfishstats.com - [25/Mar/2023:07:15:41 -0600] "GET /solutions/ HTTP/1.1" 200 4346 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" "-"
163.123.143.50 analytics.angelfishstats.com - [25/Mar/2023:07:15:44 -0600] "GET /support/ HTTP/1.1" 200 4451 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" "-"
163.123.143.50 analytics.angelfishstats.com - [25/Mar/2023:07:15:46 -0600] "GET /consulting/ HTTP/1.1" 200 4460 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" "-"
163.123.143.50 analytics.angelfishstats.com - [25/Mar/2023:07:15:48 -0600] "GET /partners/ HTTP/1.1" 200 4812 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" "-"
163.123.143.50 analytics.angelfishstats.com - [25/Mar/2023:07:15:51 -0600] "GET /resellers/ HTTP/1.1" 200 4342 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" "-"
163.123.143.50 analytics.angelfishstats.com - [25/Mar/2023:07:15:54 -0600] "GET /about/ HTTP/1.1" 200 4560 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" "-"
163.123.143.50 analytics.angelfishstats.com - [25/Mar/2023:07:15:56 -0600] "GET /careers/ HTTP/1.1" 200 4531 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" "-"
163.123.143.50 analytics.angelfishstats.com - [25/Mar/2023:07:15:58 -0600] "GET /blog/ HTTP/1.1" 200 7835 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" "-"
163.123.143.50 analytics.angelfishstats.com - [25/Mar/2023:07:16:02 -0600] "GET /privacy-policy HTTP/1.1" 200 4714 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" "-"

We suspect these hits are from a crawler because:
  • the first hit has a user agent of "python-requests/2.25.1" and uses the HEAD method
  • there are no requests for css / js / font / image files
  • there's a consistent delay of 2-3 seconds between each request
  • all the Pages requested are linked from the main page of the website

This is a perfect example of traffic that would be isolated by a Session File.  If the Include option is selected, these Pageviews would be tossed.

If you have questions about this feature, please open a support ticket. 

Angelfish has a bunch of handy features and we want to help you use them!