Websites
Websites
View Websites enables monitoring of websites visited by system users. Access blocking to sites is configured using the DLP automation feature, while this view offers three key actions related to category management.

Assign website category – the ability to assign a category to a site, similar to processes.
Categorize now (Reclassify) – a button that activates the Machine Learning mechanism to automatically categorize the site.
Report incorrect category (ML) – a button that allows reporting an incorrect site category, which may influence its reassessment by the system.
More details - detailed information about the site, including the title of the open window, the exact URL and active/inactive time expressed in seconds.
What is BTC Website Classification
BTC Website Classification is an automatic website classification system based on modern artificial intelligence algorithms. It combines Machine Learning (ML) with Deep Learning (DL) to analyze the actual content of the site — text, HTML structure, context — and assign it to one of thematic categories. This is a significantly more effective approach than traditional methods (URL rules, manual databases), especially on a dynamically changing Internet.
How it works - step by step
Retrieving the list of URLs to be categorized
The addresses are submitted to the classifier
Retrieving site content
The page code is retrieved for later analysis
Cleaning the page code of unnecessary information
The page code is cleaned of unnecessary data, such as repeated words and HTML tags
Machine Learning: Identification of keywords using ML
After cleaning the code of unnecessary components, the words (keywords) that define the character of the site are extracted
Deep Learning: Website analysis using a neural network
Data processing steps to increase the effectiveness of the deep learning model
Machine Learning: Evaluation of keyword prominence (ML)
Repeated keywords are assigned to categories based on a dictionary and the quantity (saturation) of words within particular categories is determined
Deep Learning: Global evaluation of the site in context
During website analysis the entire context of the site is taken into account, which allows more effective analysis of multi-topic sites
Determining the site classification
The site is assigned to the category identified as most likely
Security
Language detected: The system checks whether the site's language can be determined correctly.
Yes - the language was detected correctly. This is normal and safe.
No - the language could not be determined. This does not always indicate a threat, but sites with ambiguous content can be suspicious.
SSL certificate: The system verifies whether the site operates over HTTPS and has a valid SSL certificate.
Yes - the connection is encrypted, which increases security.
No - no encryption or the certificate is invalid. This is a warning sign, especially if the site requires login or data entry.
Redirects: The system checks whether the site automatically redirects the user elsewhere.
Yes - redirects detected. They are not necessarily dangerous on their own, but are often used on phishing sites or to hide the real address.
No - no suspicious redirects detected.
Safe structure: The system analyzes the HTML structure and tags to assess whether the site appears legitimate.
Yes - the page code appears correct and does not contain suspicious elements.
No - anomalies detected, atypical scripts or elements that may indicate manipulation.
Safe category: The system checks whether the site belongs to categories considered safe.
Yes - the site falls within neutral or positive categories (e.g., information, services, education).
No - the site has been classified as potentially risky. Categories considered inherently unsafe include, among others, pornography and gambling.
CERT list: The system verifies whether the site appears in the CERT database (https://www.cert.pl).
Yes – the site appears in the CERT database and is considered dangerous.
No – the site is not listed on the CERT threat list, i.e., is treated as safe.
Gambling sites list: The system checks whether the domain is listed in the Ministry of Finance gambling registry (https://hazard.mf.gov.pl).
Yes – the site is registered as gambling and operates contrary to the law.
No – the site is not is listed in the MF registry.
Malware detected: The system checks the site's presence in the URL Haus database (https://urlhaus.abuse.ch) used to identify malware sites.
Yes – the domain is listed in the URL Haus database and has been flagged as a source of malware.
No – the site is not appears in the database and has not been associated with malware.
What does this mean in practice?
With BTC Website Classification you can:
Detect and block potentially dangerous sites (malware, phishing, gambling sites, etc.).
Monitor which sites users visit. This helps manage productivity.
Automate access policies, e.g., blocking categories unsuitable for the company.
Provide greater security by analyzing actual content rather than only static rules. The system also handles new, previously unknown sites.
Main technical features
Supports over 50 languages, including Polish.
Combines two technologies: ML (for keywords) and DL (for structure and context analysis).
API support, easy integration with external systems.
When classification changes
Each site is reanalyzed monthly. This is particularly important when its content or owner changes. This ensures the classification does not "age" and reflects the current state of the site. This is important on a dynamic Internet.
Last updated
Was this helpful?