What are Bezier curves and why are important in web scraping?
What is a Bezier curve?
In computer graphics, connecting point A to point B, we use lines that can be categorized as straight or curved. The first ones are easily implemented in software, while the second, while easy to draw for humans, are much more difficult for computers.
In 1962 a French engineer working for Renault called Pierre Bezier published his studies about drawing curves well-suited for design work, using mathematical functions.
The Bezier curves are parametric curves where you define a set of control points, that determine its shape and curvature and interpolate the points in between the results.
A much more detailed explanation can be found at this link, where you can deep dive into all the mathematics aspects.
Why Bezier curves are interesting for web scraping?
As said before, Bezier curves create smooth curves for going to point A to point B, when applied to mouse movement. While a native move function on Playwright will use a straight line for going from A to B, using a Bezier curve trajectory will make the movement seem more human-like. Of course, this comes to be interesting for web scraping when we’re facing anti-bot solutions that track the user behavior to detect anomalies: reproducing a more human-like mouse movement should trigger fewer red flags.
Warning: this paragraph will contain some math!
As we noted before, what we need to implement a Bezier curve is:
A set of control points, the curve will pass from some of them and others will set other parameters.
The ratio R that represents the density of the interpolation. From 0 to 1, a ratio of 0.1 means that there will be 10 points t between the start and the end of the curve, each at the same distance. With a ratio of 0.5, there will be only 1 point t in between.
Given the following formulas for the various types of Bezier curves
we’ll see how to implement a cubic one, that requires 4 control points.
Setting the control points and ratio
I’ve chosen the following four control points. The first and the last are the start and the stop of the curve, so basically, we’ll have a curve that will be like a semi-circle but, given the coordinates of the second and the third point, will likely be more angled on the first half and smoother on the second one.
control_points = [[200, 200], [230, 400], [280, 300], [300, 200]]
Since I’d like to see the curve drawn on the monitor, I’d like to see many points belonging to it.
Using the following command from NumPy package, we’re basically setting 100 points between 0 and 1
t = np.linspace(0, 1, 100)
This means that every point is distanced of 0.01 and so we get an array of 100 values like the following:
[0.01, 0.02, 0.03] and so on.
Calculating the intermediate points
Using the cubic formula, now we’re gonna calculate the coordinates for every point of the curve.
For the X coordinates, we’ll use the X of the 4 control points, while t is the value from the interval array calculated before.
So given the generic formula the formula, the 4 control points set up before, and the t=0.01 for the first point, we’ll translate the following
and it equals 200.897.
Same for Y where
Repeating this for every interval t, will give us the list of the points of the curve.
The full article is available only to paying users of the newsletter.
You can read this and other The Lab paid articles after subscribing
Liked the article? Subscribe for free to The Web Scraping Club to receive twice a week a new one in your inbox.