Onionprobe Troubleshooting¶
This section documents common problems with Onion Services reported by Onionprobe, and gives hints in how to diagnose and solve them.
What is tested¶
Onionprobe performs several tests in Onionprobe endpoints, being able to check:
- Onion Service descriptor reachability.
- Onion Service reachability.
- TLS Certificate validity for the Onion Service, but only
- If TLS/HTTPS is expected for the Onion Service.
- If the certificate check is enabled (with the
--tls_verify
option).
- HTTP status code for the Onion Service.
For some of these tests, Prometheus alerting rules are available in the default configuration, to be triggered in case of failures.
The meaning of these alerts, along with basic steps to fix problems, are given in the next section.
Prometheus alerts¶
Onion Service unreachable¶
This alert means that Onionprobe was unable to connect to the Onion Service.
There are many causes for this alert:
- The Onion Service is offline.
- There is a problem with the Tor connectivity in the machine Onionprobe runs.
- There is a problem with the Onion Service descriptor (details below).
Checking Onion Service reachability
To check whether the service is really offline, try connecting manually from another machine.
Fixing an offline Onion Service
Fixing an offline Onion Service depends on how it's configured. Usually, restarting the service does the job.
Please check the related documentation from the Onion Service tool you're using.
Example: if you're using Onionspray, check it's troubleshooting guide.
Onion Service descriptor unreachable¶
Onionprobe is unable to fetch the Onion Service descriptor for a given service. This descriptor is a document with directions for connecting to the Onion Service. If the descriptor is unreachable, there's no way a connection to the service can happen.
There are many causes for this alert:
- The descriptor might not be available in the Onion Service Descriptor
Directory (also called
HSDir
) when the service is offline or have issues preventing a descriptor upload in the responsibleHSDirs
.- To check whether the service is really offline, try connect manually from another machine (check the section above for details).
- Onionprobe itself had trouble to connect to one of the current
HSDirs
hosting the descriptor, possibly due to a problem in the machine Onionprobe runs or due to temporary unreachability issues with the Tor network. To test that,- Check whether the machine were Onionprobe runs is able reach the Tor network.
- The
HSDir
has issues: maybe it's offline, or overloaded, or under Denial of Service (DoS).
Invalid TLS certificate for the Onion Service¶
This alert means that the Onion Service is listening to TLS connections, but it's offering an invalid certificate.
The TLS certificate might be invalid in various ways, like:
- It's self-signed. Some HTTP Onion Services offer self-signed certs, and
while some applications may accept these self-signed certificates for Onion
Services without displaying warnings, Onionprobe will complain if the
certificate is self-signed and the
--tls_verify
option is active (which might be the default). - The TLS certificate expired.
- It's
SubjectAltName
does not match the Onion Service address. - It's malformed or don't pass other validation tests done by the TLS library on the client side.
- If there's problem in the connection between the Onion Service server and a backend application. This is uncommon, but might happen. Since Onion Services uses peer-to-peer encryption between the client and the service, an invalid TLS certificate usually only means a service misconfiguration. But if the connection between the Onion Service server and the backend application is compromised, there are chances that an invalid certificate means someone (or something) was able to tamper the connection between the Onion Service and the backend.
A fix for this alert involves:
- Generating a new TLS certificate for the service. This is the most common fix for this alert.
- Double checking the connection between the Onion Service server and the backend application, to make sure that the expected certificate is presented.
Expiring TLS certificate for the Onion Service¶
This alert is triggered when a TLS certificate offered by an Onion Service will expire in less than a week.
A fix for this alert involves generating a new TLS certificate for the service.
Unexpected HTTP status code¶
This alert fires when Onionprobe receives an unexpected HTTP status code.
By default, Onionprobe expects a HTTP 200
status code, but this can be
configured for each path tested on each Onion Service.
An unexpected status code might mean that the application served by the Onion Service is malfunctioning.
Onionprobe not responding¶
When Prometheus is unable to determine Onionprobe's state, this alert is fired.
Possible causes include:
- Onionprobe crashed or is not responding.
- A networking error between the Prometheus server and the Onionprobe instance.
If this alert triggers:
- Check if Onionprobe is running.
- Check if Onionprobe is serving Prometheus metrics.
- Check whether Onionprobe's metrics endpoint is reachable from the Prometheus server.