Learn / Blog / Article

The power and pitfalls of regular expressions

‘Put our customers at the heart of everything’ is Hotjar's foremost core value. In line with this principle, our top priorities are protecting and being consistently available to our users. Regular expressions, or regex, are commonly employed to safeguard web services against harmful inputs. But they can also (somewhat ironically) become a source of vulnerability if not meticulously constructed.

Hotjar's tech blog

Miloslav Pavelka

Last updated

18 Jan 2024

Reading time

5 min

This article shares a particular regular expression pattern we identified during code review and sheds light on the importance of maintaining vigilant security measures at every stage.

Summary

Our Engineering team chose a regular expression (regex) approach to safeguard our URL shortening feature for Hotjar Heatmaps. Considering the following types of attacks helped us validate this approach:

ReDoS attack: this type of denial-of-service attack exploits a regex feature called backtracking, which could disrupt server processing and cause valid user requests to be unattended. This situation poses a risk of violating service-level agreements.
Malicious redirect: when site security is inadequate, attackers could manipulate the URL shortening feature to generate links that misdirect users to third-party websites, with potentially severe consequences for users and the affected business

To fortify our security measures against these threats, our team has implemented three key actions: rigorous code review and continuous learning practices, static application security testing, and a bug bounty program.

A deep dive into the technical landscape

A heatmap in Hotjar is tied to many attributes—like time range, session duration, country, screen resolution, operating system, and number of rage clicks—and they can be defined using filters in URL parameters.

The back-end of Hotjar makes it easy for anyone to share a heatmap with their colleagues by employing a useful URL shortening feature that encapsulates the parameters and generates a short link.

The URL shortening feature in Hotjar Heatmaps

For the URL shortening feature to be useful in all environments of the continuous integration, it has to support multiple domains. Among the available solutions, we chose a regular expression approach to validate the input URL.

Let’s look at a couple of hypothetical attacks we considered during the secure code review phase that your security teams might attempt to validate an approach like this.

ReDoS attack

The exploit

In a scenario where regular expressions aren't optimally designed, commands like the one above could be all it takes to bog down server performance, effectively preventing it from processing valid requests from regular users. This scenario is known as ‘denial-of-service’ or DoS.

Why is this a problem?

In the case of a successful denial-of-service attack

Users are prevented from using the service, and rendered incapable of getting value from it
The service provider risks breaking service-level agreements (SLAs) with paying customers

The cause

Regular expression denial of service (ReDoS) exploits a feature of regex called backtracking. It causes significant computational resource drain when the regular expression pattern uses ‘back-references’ and excessive or nested quantifiers (e.g. (a+)+) or alternations with overlapping terms (e.g. (a|aa)*b). Such a regular expression pattern is called evil regex, catastrophic backtracking, or pathological regex.

We identified a regular expression similar to the one below to validate the URL provided to the URL shortening feature’s endpoint:

With the intention for the expression to match multiple domains (insights.hotjar.com, review-31.insights.hotjar.com, review.insights.hotjar.com), an evil regex was implemented and passed for code review.

The fix

It turns out the regular expression pattern could be made much stricter and still pass the existing tests—the second + quantifier wasn’t needed at all. To mitigate the risk of introducing bugs to a regular expression because of its complexity, the allow-list can be constructed using a more robust set of secure regular expressions:

Malicious redirect

The exploit

In scenarios where the software development life-cycle doesn't prioritize meticulous security considerations, especially in the construction of regular expressions, a URL shortening feature could be vulnerable, allowing the creation of deceptive links that misdirect users from legitimate URLs.

Why is this a problem?

When the URL shortening feature is weaponized to generate links to third-party-owned websites

Attackers can talk to users on behalf of the service provider’s organization and present them with content that doesn’t comply with the organization’s just cause and core values, damaging its reputation and endangering sales
Attackers can present users with content that tricks them into using a competitor’s product, decreasing the service provider’s market share and annual recurring revenue (ARR)
Attackers can trick users into sharing sensitive information, from usernames and passwords to social security or payment card numbers, causing customers to file lawsuits, which costs money, damages reputation, and potentially slows down the process of closing deals

The cause

Let’s revisit the broken regular expression from earlier:

In the URL shortening feature, Python's re.match() is supplied with the user-crafted URL, which is then matched against a regular expression that ends with \.hotjar\.com\/?.*. (Note that the forward slash at the end is optional—the ? character instructs to match zero or one instance of the previous character).

This means the regular expression pattern ensures that the domain name of the input URL starts with the matching pattern, but allows it to expand to another one.

The fix

Changing the end of the regular expression pattern from \.com\/?.* to \.com($|\/.*) ensures the URL either ends with .com or continues with a forward slash, meaning no other domain can be crafted.

3 ways we uphold robust security standards at Hotjar

Code review and continuous learning: peer code reviews are integral at Hotjar. By fostering a culture that prioritizes security awareness and continuous learning, we ensure that patterns like the evil regex are spotted and rectified long before they pose any threat.
Static application security testing (SAST): we leverage SAST tools as proactive guards against potential security challenges. These tools identify potential threats, ensuring they never transition into our production environments.
Bug bounty program: our active bug bounty program is a testament to Hotjar's commitment to security. Engaging with external security experts allows us to tap into a diverse pool of expertise and rectify potential vulnerabilities. White-hat hackers are welcome to join this initiative and be rewarded for pointing out weak spots in the walls of our security fortress.

Wrapping up

While regular expressions are immensely powerful, using them demands caution and expertise. Luckily, several tools are effective for debugging and rigorous testing (for example, we use Regex101 or RegExr). Ultimately, the right tools, coupled with a holistic and vigilant approach, ensure that we view potential challenges through a preemptive lens, keeping our code base fortified.

🪲 Help us hunt bugs

Hotjar values the broader technical community insights—collaborate with us! Explore our bug bounty program and partner with Hotjar in our commitment to ensuring a secure and dependable online experience.

Join our bug bounty program

Hotjar's tech blog

How we optimized perceived performance to improve our KPIs: a Hotjar case study

No one likes to wait. Even at a nice restaurant, if the food takes too long to arrive, you’ll start wriggling on your chair and looking around for the waiter. Worst case, you might even leave.

This is nothing new: people are impatient by nature, and your users are no different. If you ask them to wait too long for their requested content, they’ll enjoy the experience less and eventually drop off.

Eryk Napierała

Hotjar's tech blog

Observability for product teams: what, why, who, and how

Observability is not a new concept in the software industry, but it still amazes me how many different interpretations I hear about this term. In many cases, observability ownership appears ambiguous, and it's also common to see it treated as an afterthought.

Clint Calleja

Hotjar's tech blog

3 common questions from Hotjar’s engineering candidates, answered

At Hotjar, our Engineering team often gets questions from candidates in the hiring process about how we function internally. Questions like these tend to come up after a candidate’s technical interview, when we may have run out of time to discuss them in depth. We know this information is useful for candidates, and we can’t always address these questions as fully as we’d like. So, we’re sharing our responses here.

Simon Agius Muscat

Explore all articles

The power and pitfalls of regular expressions

Last updated

Reading time

Share

Summary

A deep dive into the technical landscape

ReDoS attack

The exploit

Why is this a problem?

The cause

The fix

Malicious redirect

The exploit

Why is this a problem?

The cause

The fix

3 ways we uphold robust security standards at Hotjar

Wrapping up

🪲 Help us hunt bugs

Related articles

How we optimized perceived performance to improve our KPIs: a Hotjar case study

Observability for product teams: what, why, who, and how

3 common questions from Hotjar’s engineering candidates, answered