Tanh is a useful activation function when you want bounded, zero-centered outputs, but it comes with a computational cost because it relies on exponentials. HardTanh keeps the same basic idea while replacing the smooth curve with a simple clipped line, which makes it much cheaper to evaluate.
That trade-off is why HardTanh still shows up in practical systems. It is not the default activation for modern deep networks, but it remains useful when efficiency, bounded outputs, and hardware-friendliness matter more than smoothness.
The hyperbolic tangent maps any real number to the range (-1, 1):
$$ \tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}} $$
That gives tanh three useful properties:
Those properties made tanh popular, especially in older neural network architectures. The downside is that tanh is more expensive to compute than a simple clamp-based operation.

In many practical settings, that smoothness is not essential. Sometimes all you really need is a bounded, zero-centered activation that is cheap to compute. That is the gap HardTanh fills.
A piecewise-linear approximation replaces a curve with a small number of straight-line segments. Instead of using one complicated formula everywhere, it uses a simple rule in each region.

HardTanh approximates tanh with three regions:
So HardTanh keeps the bounded, zero-centered behavior of tanh, but replaces the smooth curve with something much simpler.
HardTanh is defined as:
$$ \operatorname{HardTanh}(x) = \begin{cases} -1, & x < -1 \ x, & -1 \le x \le 1 \ 1, & x > 1 \end{cases} $$
An equivalent implementation-friendly form is:
$$ \operatorname{HardTanh}(x) = \min(1, \max(-1, x)) $$

The intuition is simple: HardTanh is just the identity function clipped to the interval [-1, 1].
Inside the central region, the function behaves like the identity, so the gradient is constant there. Outside that region, the activation saturates.
The derivative is:
$$ \operatorname{HardTanh}’(x) = \begin{cases} 0, & x < -1 \ 1, & -1 < x < 1 \ 0, & x > 1 \end{cases} $$
A simple PyTorch implementation is given below:
import torch
def hardtanh(
x: torch.Tensor,
min_val: float = -1.0,
max_val: float = 1.0,
) -> torch.Tensor:
"""
Compute the HardTanh activation function.
Args:
x: Input tensor.
min_val: Minimum output value.
max_val: Maximum output value.
Returns:
Tensor clipped to [min_val, max_val].
"""
return x.clamp(min=min_val, max=max_val)
def main():
x = torch.tensor([-2.0, -0.5, 0.0, 0.5, 2.0], requires_grad=True)
y = hardtanh(x)
print("Output:", y)
# tensor([-1.0000, -0.5000, 0.0000, 0.5000, 1.0000])
y.sum().backward()
print("Gradient:", x.grad)
# tensor([0., 1., 1., 1., 0.])
if __name__ == "__main__":
main()
| Property | HardTanh | tanh | Sigmoid | ReLU |
|---|---|---|---|---|
| Cheap computation | Yes | No | No | Yes |
| Bounded output | Yes, [-1, 1] | Yes, [-1, 1] | Yes, [0, 1] | No |
| Symmetric around zero | Yes | Yes | No | No |
| Linear gradient region | Yes | Limited | Limited | Yes |
| Hardware friendly | Yes | Less so | Less so | Yes |
| Smooth everywhere | No | Yes | Yes | No |
HardTanh sits somewhere between tanh and ReLU. It keeps the bounded, zero-centered behavior of tanh while being much cheaper to compute, and it is more controlled than ReLU because the output cannot grow without bound. Compared with sigmoid, it is also zero-centered and has a broader linear region. The trade-off is that HardTanh is not smooth, and once the activation saturates outside [-1, 1], the gradient becomes zero.
HardTanh is not a general-purpose winner, but it is still a useful tool in the right setting. If we need bounded outputs and cheap computation, it is a practical alternative to tanh. If smoothness matters more, tanh or other modern activations are usually a better fit.