Onionprobe ChangeLog¶
v1.3.0 - Unreleased¶
Fixes¶
- Standalone monitoring node:
-
The
start
action in theonionprobe-monitor
script now pulls and builds images. -
Failure rate was erroneously being reported at 1% when all services were working, when the excepted would be a reported value of 0%. This is now fixed.
Features¶
- Standalone monitoring node:
- Grafana dashboard got a new time series: number of missing Onion Service descriptors in HSDirs.
v1.2.1 - 2024-11-27¶
Features¶
-
The installation page updated to include a reference to the new Onionprobe Ansible Role.
-
-
Added support for Podman and Podman Compose (tpo/onion-services/onionprobe#97). It can be enable by setting
CONTAINER_RUNTIME=podman
in the.env
file. For backwards compatibility, Docker is still the default container runtime. -
New
onionprobe-monitor
script acting as a wrapper for interacting with the container runtime (tpo/onion-services/onionprobe#97). Given that Podman and Docker have a few differences, it made sense to create a thin wrapper around them, to handle things like Podman not honoring some Compose variables in .env files.
-
Fixes¶
-
Updated the sample systemd service unit (tpo/onion-services/onionprobe!72).
-
Upgraded Prometheus image to 3.0.0.
-
PostgreSQL:
-
Upgraded image to version 17. Please run the needed upgrading steps.
-
Minor fixes at
upgrade-postgresql-database
.
-
-
Updated development procedure.
-
Improved verbosity for the Tor initialization log message.
v1.2.0 - 2024-04-24¶
Features¶
-
New metrics (tpo/onion-services/onionprobe#78):
- From the outer descriptor wrapper:
descriptor-lifetime
.revision-counter
.
- From the second layer of encryption:
single-onion-service
.pow-params
.
- HSDir latency when fetching descriptors.
- From the outer descriptor wrapper:
-
Enhanced Grafana Dashboard (tpo/onion-services/onionprobe#80) with the following new visualizations:
- Overview:
- Current failure rate of onionsites.
- Total expiring certificates in the next 7 days.
- List of certificate expirations up to the next 180 days.
- List of unreachable instances.
- Graph with the total unreachable instances.
- List of invalid HTTPS certificates.
- List of services with HTTPS errors.
- Performance:
- Total of minimum, average and maximum service connection latency.
- Total of minimum, average and maximum descriptor fetch latency.
- Chart of minimum, average and maximum service connection latency.
- Chart of minimum, average and maximum descriptor fetch latency.
- Rate of services using the single hop mode, relative to the total services monitored.
- List of slow services.
- Descriptors:
- List of services missing a published descriptor.
- Chart of the minimum, average and maximum descriptor sizes (decrypted outer layer).
- Chart of the minimum, average and maximum descriptor sizes (decrypted second layer).
- Introduction points:
- Chart of minimum, average and maximum number of introduction points per service.
- List of services and it's number of introduction points.
- HSDir:
- Total number of HSDirs tested.
- Chart of minimum, average and maximum HSDir latency for fetching descriptors.
- List of HSDirs sorted by descriptor fetch latency.
- Proof of Work (PoW):
- Ratio of services with PoW enabled, relative to the total services monitored.
- Total number of services with PoW enabled.
- Chart of minimum, average and maximum PoW v1 effort seem.
- List of services with PoW enabled.
- List of services with PoW enabled with effort greater than zero.
- Overview:
-
Improved log message for elapsed time.
-
New log messages for:
- Number of introduction points.
- HS_DESC events:
- Descriptor reachability.
- HSDir used.
-
Create a GitLab release at every new tag (experimental) (tpo/onion-services/onionprobe#82).
-
Running lintian on CI to check the generated Debian package.
Fixes¶
-
Manpage generation is now compatible with the Onion Services Ecosystem Portal (tpo/onion-services/ecosystem#1).
-
Use the correct copyright line in source files.
-
Support for a wider range of pyca/cryptography versions at
setup.cfg
. -
Display Tor bootstrap messages only for the debug log level.
-
Disable stem logging if log level is below debug (tpo/onion-services/onionprobe#63).
-
Exit codes now reflects reality (tpo/onion-services/onionprobe#64).
-
Calculate the elapsed time for descriptors right after fetching.
-
Updated the SecureDrop list.
-
Upgraded Grafana image to 10.4.2.
-
Upgraded Alertmanager image to 0.27.0.
-
Upgraded Prometheus image to 2.51.2.
-
Upgraded PostgreSQL image to 16. Please run the needed upgrading steps.
-
Upgraded CI and container images to Debian bookworm.
-
Upgraded
vendors/onion-mkdocs
.
v1.1.2 - 2023-09-28¶
Fixes¶
-
Make the tor process quiet when generating hashed passwords (reported by @anarcat): https://gitlab.torproject.org/tpo/onion-services/onionprobe/-/issues/81
-
CI/CD: use rsync to copy slide artifacts, preserving the folder structure.
-
Minor documentation improvements.
Features¶
-
Debug outer and inner layer descriptor contents.
-
Decrease Prometheus certificate expiration alerts to 7 days in advance.
v1.1.1 - 2023-04-04¶
Fixes¶
- Grafana dashboard:
-
Apply workaround for "Invalid dashboard UID in the request error on custom home dashboard": https://github.com/grafana/grafana/issues/54574
-
Docker:
-
Stick to specific upstream image versions to avoid unexpected upgrade issues.
-
Change the
onionprobe
image version scheme to match semantic versioning. -
PostgreSQL upgrade script (
upgrade-postgresql-database
): - Misc fixes.
v1.1.0 - 2023-04-03¶
Fixes¶
-
Stick to a PostgreSQL docker image: See https://gitlab.torproject.org/tpo/onion-services/onionprobe/-/issues/70
-
Command-line URL parsing: https://gitlab.torproject.org/tpo/onion-services/onionprobe/-/merge_requests/17
-
Display default values for most options on
onionprobe --help
.
Features¶
- Support for Tor metrics Prometheus exporter via
MetricsPort
andMetricsPortPolicy
settings, available respectively asmetrics_port
andmetrics_port_policy
configuration or command line parameters.
These settings are disabled by default. If you plan to use this with the
standalone monitoring node, you may also want to edit configs/prometheus/prometheys.yml
and uncomment Tor's Prometheus configuration so this data becomes available at
Prometheus, Alertmanager and Grafana.
WARNING: Before enabling this, it is important to understand that exposing
tor metrics publicly is dangerous to the Tor network users. Please take extra
precaution and care when opening this port. Set a very strict access policy
with MetricsPortPolicy
and consider using your operating systems firewall
features for defense in depth.
We recommend, for the prometheus format, that the only address that can access this port should be the Prometheus server itself. Remember that the connection is unencrypted (HTTP) hence consider using a tool like stunnel to secure the link from this port to the server.
Check the standalone monitoring node docs for detailed instructions in how to enable this additional metric collection.
- TLS certificate verification:
-
Added a global
tls_verify
flag to check certificates during HTTP tests. Set it tofalse
to ignore TLS certificate verification. By default all TLS certificates are checked. -
Added a per-endpoint
tls_verify
flag to check certificates in HTTP tests, overriding the global setting for the endpoint context. -
Changed the
onion_service_valid_certificate
metric to also inform when a certificate wasn't tested by setting a value of2
on that case. This isn't a breaking change since the TLS certificate is enabled by default, so unless verification is disabled the metric will only vary between0
(invalid cert) and1
(valid cert). -
TLS and X.509 certificate test:
-
Added a new test to check the conditions of the underlying TLS connection and to get detailed certificate information.
-
This test currently only happens for endpoints with the
https
protocol, and only if thetest_tls_connection
configuration is set to true in the global scope or in the endpoint configuration. -
Certificates are retrieved and analyzed even if they're not valid, in order to also collect data on self-signed, expired or otherwise invalid certificates.
-
A number of new metrics is included both for the TLS connection and for the server certificate:
-
onion_service_certificate_not_valid_before_timestamp_seconds
: Register the beginning of the validity period of the certificate in UTC. This does not mean necessarily that the certificate is CA-validated. Value is represented as a POSIX timestamp, -
onion_service_certificate_not_valid_after_timestamp_seconds
: Register the end of the validity period of the certificate in UTC. This does not mean necessarily that the certificate is CA-validated. Value is represented as a POSIX timestamp. -
onion_service_certificate_expiry_seconds
: Register how many seconds are left before the certificate expire. Negative values indicate how many seconds passed after the certificate already expired. -
onion_service_certificate_match_hostname
: Register whether a provided server certificate matches the server hostname in a TLS connection: value is 1 for matched hostname and 0 otherwise. Check is done both on the commonName and subjectAltName fields. A value of 1 does not mean necessarily that the certificate is CA-validated. -
onion_service_certificate_info
: Register miscellaneous TLS certificate information for a given Onion Service such as version and fingerprints. -
onion_service_tls_security_level
: Tracks the SSL security level in use. Needs Python 3.10+ to work. See SSL_CTX_get_security_level(3) manpage for details: https://www.openssl.org/docs/manmaster/man3/SSL_CTX_get_security_level.html -
onion_service_tls_info
: Register miscellaneous TLS information for a given Onion Service such as version and ciphers.
-
-
Prometheus rules for the standalone monitoring node were updated to include an alert for certificates about to expire (defaults to 30 days in advance).
-
Details at https://gitlab.torproject.org/tpo/onion-services/onionprobe/-/issues/49
-
Added the
onion_service_generic_error_total
metric to track probing errors not covered by other metrics. -
Added script to handle PostgreSQL version upgrades at the service container: https://gitlab.torproject.org/tpo/onion-services/onionprobe/-/issues/70
-
Using Onion Mkdocs for the documentation, now hosted at https://tpo.pages.torproject.net/onion-services/onionprobe/
See https://gitlab.torproject.org/tpo/onion-services/onionprobe/-/issues/19
- Expected HTTP status codes:
- Per-endpoint configuration specifying a list of expected HTTP status codes, useful when it's expected that an endpoint returns a status other than 200.
-
Custom metric indicating if the status code is expected or not.
-
CI/CD: added jobs to test building debian and python packages, as well as configurations and slides.
v1.0.0 - 2022-05-31¶
Breaking changes¶
- Changed Prometheus exporter metric names to adhere to the Best practices and to other recommendations when writing an exporter. Prometheus admins might want to rename their old metrics to the new ones to keep time series continuity, drop the old ones or keep both during a transition phase. The following metrics were renamed:
- From
onionprobe_wait
toonionprobe_wait_seconds
. - From
onion_service_latency
toonion_service_latency_seconds
. - From
onion_service_descriptor_latency
toonion_service_descriptor_latency_seconds
. - From
onion_service_fetch_error_counter
toonion_service_fetch_error_total
. - From
onion_service_descriptor_fetch_error_counter
toonion_service_descriptor_fetch_error_total
. - From
onion_service_request_exception
toonion_service_request_exception_total
. - From
onion_service_connection_error
toonion_service_connection_error_total
. - From
onion_service_http_error
toonion_service_http_error_total
. - From
onion_service_too_many_redirects
toonion_service_too_many_redirects_total
. - From
onion_service_connection_timeout
toonion_service_connection_timeout_total
. - From
onion_service_read_timeout
toonion_service_read_timeout_total
. - From
onion_service_timeout
toonion_service_timeout_total
. -
From
onion_service_certificate_error
toonion_service_certificate_error_total
. -
Removed the
updated_at
label from all metrics, which was creating a new data series for every measurement on Prometheus. -
Removed the
hsdir
label fromonion_service_descriptor_reachable
metric, which was creating a new data series for every measurement on Prometheus.
Features¶
-
Monitoring node setup using Docker Compose and Prometheus, Alertmanager and Grafana dashboards served via Onion Services.
-
Config generation improvements.
-
New metrics:
onion_service_fetch_requests_total
.onion_service_descriptor_fetch_requests_total
.onion_service_descriptor
, with Onion Service descriptor information.-
onion_service_probe_status
, with timestamp from the last test. -
Default Grafana Dashboard with basic metrics.
v0.3.4 - 2022-05-11¶
Fixes¶
- Onionprobe's exporter port allocation conflict with the push gateway https://gitlab.torproject.org/tpo/onion-services/onionprobe/-/issues/45
v0.3.3 - 2022-05-11¶
Fixes¶
- Stem is unable to find cryptography module when runing from the pip package https://gitlab.torproject.org/tpo/onion-services/onionprobe/-/issues/43
v0.3.2 - 2022-05-11¶
Main issue: https://gitlab.torproject.org/tpo/onion-services/onionprobe/-/issues/42
Features¶
- Enhanced config generators: switch all three config generators currently supporter (Real-World Onion Sites, SecureDrop and TPO) to rely on argparse for command line arguments.
v0.3.1 - 2022-05-10¶
Main issue: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40717
Features¶
-
Adds
packages/tpo.py
to generate an Onionprobe config with Tor Project's .onions. Details at https://gitlab.torproject.org/tpo/onion-services/onionprobe/-/merge_requests/4 -
Other minor fixes and enhancements.
v0.3.0 - 2022-04-19¶
Main issue: https://gitlab.torproject.org/tpo/onion-services/onionprobe/-/issues/4
Features¶
- Debian package.
- Better logging.
- Additional command line options.
- Handling of SIGTERM and other signals.
Documentation¶
- Manpage.
- Auto-generate command line docs from CLI invocation.
- Auto-generate manpage from
argparse
.
v0.2.2 - 2022-04-06¶
Fixes¶
- Print usage when no arguments are supplied.
v0.2.1 - 2022-04-06¶
Fixes¶
- Python package fixes.
v0.2.0 - 2022-04-06¶
Main issue: https://gitlab.torproject.org/tpo/onion-services/onionprobe/-/issues/3
Enhancements¶
- Python packaging: https://pypi.org/project/onionprobe.
- Support for
--endpoints
command line argument. - Display available metrics at command line usage.
- Adds
OnionprobeConfigCompiler
to help compile custom configuration.
v0.1.0 - 2022-03-31¶
Main issue: https://gitlab.torproject.org/tpo/onion-services/onionprobe/-/issues/2
Meta¶
- Move the repository to the Onion Services Gitlab group.
- Docstrings.
- Environment variable controlling the configuration file to use.
Probing¶
- Set timeout at
get_hidden_service_descriptor()
. - Set timeout at
Requests
. - Set
CircuitStreamTimeout
in the built-in Tor daemon. - HTTPS certificate validation check/exception.
- Max retries before throwing an error when getting descriptors. This could help answering the following questions:
- Max retries before throwing an error when querying the endpoint.
Metrics¶
- Status: sleeping, probing, starting or stopping.
- Match found / not found.
- Metric units in the description.
- Number of introduction points.
- Timestamp label.
- Register HSDir used to fetch the descriptor.
Check the control-spec
for
HSFETCH
command and theHS_DESC
event (using SETEVENTS). Relevant issues:
Enhancements¶
- Refactor into smaller modules.
- Better exception handling.
Bonus¶
- Script that compiles configuration from the real-world-onion-sites repository.
- Script that compiles configuration from the the SecureDrop API.
v0.0.1 - 2022-03-23¶
Main issue: https://gitlab.torproject.org/tpo/onion-services/onionprobe/-/issues/1
Basic¶
- Take a list of onions to check and make sure that you can always fetch descriptors rather than just using cached descriptors etc.
- Randomisation of timing to avoid systemic errors getting lucky and not detected.
- Looping support: goes through the list of onions in a loop, testing one at a time continuously.
- Flush descriptor caches so testing happens like if a fresh client.
- Support for HTTP status codes.
- Page load latency.
- Ability to fetch a set of paths from each onion. Customisable by test path: not all our sites have content at the root, but do not bootstrap every time if that can be avoided.
- Need to know about "does the site have useful content?" Regex for content inside the page: allow configuring a regex per path for what should be found in the returned content/headers.
- Documentation.
Meta¶
- Dockerfile (and optionally a Docker Compose).
Prometheus¶
- Exports Prometheus metrics for the connection to the onion service, and extra metrics per path on the status code for each path returned by the server. If using the prometheus exporter with python, consider to just use request and beautiful soup to check that the page is returning what one expects.
- Add in additional metrics wherever appropriate.
- To get the timings right, the tool should take care of the test frequency and just expose the metrics rather than having Prometheus scraping individual targets on Prometheus' schedule.
Bonus¶
- Optionally launch it's own Tor process like in this example.