Learn / Blog / Article

Back to blog

The power and pitfalls of regular expressions

‘Put our customers at the heart of everything’ is Hotjar's foremost core value. In line with this principle, our top priorities are protecting and being consistently available to our users. Regular expressions, or regex, are commonly employed to safeguard web services against harmful inputs. But they can also (somewhat ironically) become a source of vulnerability if not meticulously constructed.

Last updated

18 Jan 2024

Reading time

5 min

Share

This article shares a particular regular expression pattern we identified during code review and sheds light on the importance of maintaining vigilant security measures at every stage.

Summary

Our Engineering team chose a regular expression (regex) approach to safeguard our URL shortening feature for Hotjar Heatmaps. Considering the following types of attacks helped us validate this approach:

  • ReDoS attack: this type of denial-of-service attack exploits a regex feature called backtracking, which could disrupt server processing and cause valid user requests to be unattended. This situation poses a risk of violating service-level agreements.

  • Malicious redirect: when site security is inadequate, attackers could manipulate the URL shortening feature to generate links that misdirect users to third-party websites, with potentially severe consequences for users and the affected business

To fortify our security measures against these threats, our team has implemented three key actions: rigorous code review and continuous learning practices, static application security testing, and a bug bounty program.

A deep dive into the technical landscape

A heatmap in Hotjar is tied to many attributes—like time range, session duration, country, screen resolution, operating system, and number of rage clicks—and they can be defined using filters in URL parameters. 

The back-end of Hotjar makes it easy for anyone to share a heatmap with their colleagues by employing a useful URL shortening feature that encapsulates the parameters and generates a short link.

#The URL shortening feature in Hotjar Heatmaps
The URL shortening feature in Hotjar Heatmaps

For the URL shortening feature to be useful in all environments of the continuous integration, it has to support multiple domains. Among the available solutions, we chose a regular expression approach to validate the input URL.

Let’s look at a couple of hypothetical attacks we considered during the secure code review phase that your security teams might attempt to validate an approach like this.

ReDoS attack

The exploit

In a scenario where regular expressions aren't optimally designed, commands like the one above could be all it takes to bog down server performance, effectively preventing it from processing valid requests from regular users. This scenario is known as ‘denial-of-service’ or DoS.

Why is this a problem?

In the case of a successful denial-of-service attack

  1. Users are prevented from using the service, and rendered incapable of getting value from it

  2. The service provider risks breaking service-level agreements (SLAs) with paying customers

The cause

Regular expression denial of service (ReDoS) exploits a feature of regex called backtracking. It causes significant computational resource drain when the regular expression pattern uses ‘back-references’ and excessive or nested quantifiers (e.g. (a+)+) or alternations with overlapping terms (e.g. (a|aa)*b). Such a regular expression pattern is called evil regex, catastrophic backtracking, or pathological regex.

We identified a regular expression similar to the one below to validate the URL provided to the URL shortening feature’s endpoint:

With the intention for the expression to match multiple domains (insights.hotjar.com, review-31.insights.hotjar.com, review.insights.hotjar.com), an evil regex was implemented and passed for code review.

The fix

It turns out the regular expression pattern could be made much stricter and still pass the existing tests—the second + quantifier wasn’t needed at all. To mitigate the risk of introducing bugs to a regular expression because of its complexity, the allow-list can be constructed using a more robust set of secure regular expressions:

Malicious redirect

The exploit

In scenarios where the software development life-cycle doesn't prioritize meticulous security considerations, especially in the construction of regular expressions, a URL shortening feature could be vulnerable, allowing the creation of deceptive links that misdirect users from legitimate URLs.

Why is this a problem?

When the URL shortening feature is weaponized to generate links to third-party-owned websites

  1. Attackers can talk to users on behalf of the service provider’s organization and present them with content that doesn’t comply with the organization’s just cause and core values, damaging its reputation and endangering sales

  2. Attackers can present users with content that tricks them into using a competitor’s product, decreasing the service provider’s market share and annual recurring revenue (ARR)

  3. Attackers can trick users into sharing sensitive information, from usernames and passwords to social security or payment card numbers, causing customers to file lawsuits, which costs money, damages reputation, and potentially slows down the process of closing deals

The cause

Let’s revisit the broken regular expression from earlier:

In the URL shortening feature, Python's re.match() is supplied with the user-crafted URL, which is then matched against a regular expression that ends with \.hotjar\.com\/?.*. (Note that the forward slash at the end is optional—the ? character instructs to match zero or one instance of the previous character).

This means the regular expression pattern ensures that the domain name of the input URL starts with the matching pattern, but allows it to expand to another one.

The fix

Changing the end of the regular expression pattern from \.com\/?.* to \.com($|\/.*) ensures the URL either ends with .com or continues with a forward slash, meaning no other domain can be crafted.

3 ways we uphold robust security standards at Hotjar

  1. Code review and continuous learning: peer code reviews are integral at Hotjar. By fostering a culture that prioritizes security awareness and continuous learning, we ensure that patterns like the evil regex are spotted and rectified long before they pose any threat.

  2. Static application security testing (SAST): we leverage SAST tools as proactive guards against potential security challenges. These tools identify potential threats, ensuring they never transition into our production environments.

  3. Bug bounty program: our active bug bounty program is a testament to Hotjar's commitment to security. Engaging with external security experts allows us to tap into a diverse pool of expertise and rectify potential vulnerabilities. White-hat hackers are welcome to join this initiative and be rewarded for pointing out weak spots in the walls of our security fortress.

Wrapping up

While regular expressions are immensely powerful, using them demands caution and expertise. Luckily, several tools are effective for debugging and rigorous testing (for example, we use Regex101 or RegExr). Ultimately, the right tools, coupled with a holistic and vigilant approach, ensure that we view potential challenges through a preemptive lens, keeping our code base fortified.

🪲 Help us hunt bugs

Hotjar values the broader technical community insights—collaborate with us! Explore our bug bounty program and partner with Hotjar in our commitment to ensuring a secure and dependable online experience.