Xovi and Xenu are greedy tools that can affect smaller servers, but even larger servers should block them to avoid prying eyes.

As a webmaster you will encounter all sorts of bots. Some of them are good, but many of them are bad. I am all for sharing information, but if people scrape information without asking nicely they will get a ban from me.

Weblog Expert: Analyzing RAW Server Logs

Don’t rely on Google Analytics only, download your server logs usually stored in /usr/local/apache/logs, /usr/local/apache/domlogs/, /var/log/nginx and /var/log. Make sure to get the correct vhost log and grab your nginx microcache.log as well if you have it enabled.

Use tools such as Weblog Expert to analyze the RAW logs.

The advantage of working with real server logs is that you are using RAW data, not something that could be manipulated, never trust/rely on external parties, not even Google.

You will also get graphs that are a little nicer to look at:
To find bad bots, click on Visitors – Hosts. Then check the IPs with the most visits and enter it into a tool such as ip2location.com to figure out what network they are coming from. Don’t block Googlebot by accident!

Block Xovi, Xeni, Python Via HtAccess

 RewriteCond %{HTTP_USER_AGENT} ^.*(HTTrack|clshttp|archiver|loader|email|nikto|miner|python|xovi|xenu).* [NC,OR]
 RewriteCond %{HTTP_USER_AGENT} ^.*(winhttp|libwww\-perl|curl|wget|harvest|scan|grab|extract).* [NC]
 RewriteRule ^(.*)$ - [F,L]

To add a new bad-bot simply add another separator directly after xenu and add the name of the bot you want to block.

Block Hetzner Via Your Firewall

Xovi seems to be hosted on a Hetzner server or at least their scraper is. Make sure to block the entire Hetzner IP block.

Add this to your firewall. You can also block IPs via htaccess, but it’s better to block them on the server level.

order allow,deny
deny from
deny from
deny from
deny from

I’ll be taking another more extensive look at “working with server logs” in another article. For now this should get you started. Read through the guides below for more useful examples to block bad bots.

