Calculus: Hyperbolic Trigonometry, II

Now on to some calculus involving hyperbolic trigonometry!  Today, we’ll look at trigonometric substitutions involving hyperbolic functions.

Let’s start with a typical example:

\displaystyle\int\sqrt{1+x^2}\,dx.

The usual technique involving circular trigonometric functions is to put x=\tan(\theta), so that dx=\sec^2(\theta)\,d\theta, and the integral transforms to

\displaystyle\int\sec^3(\theta)\,d\theta.

In general, we note that when taking square roots, a negative sign is sometimes needed if the limits of the integral demand it.

This integral requires integration by parts, and ultimately evaluating the integral

\displaystyle\int\sec(\theta)\,d\theta.

And how is this done?  I shudder when calculus textbooks write

\displaystyle\int \sec(\theta)\cdot\dfrac{\sec(\theta)+\tan(\theta)}{\sec(\theta)+\tan(\theta)}\,d\theta=\ldots

How does one motivate that “trick” to aspiring calculus students?  Of course the textbooks never do.

Now let’s see how to approach the original integral using a hyperbolic substitution.  We substitute x=\sinh(u), so that dx=\cosh(u)\,du and \sqrt{1+x^2}=\cosh(u).  Note well that taking the positive square root is always correct, since \cosh(u) is always positive!

This results in the integral

\displaystyle\int\cosh^2(u)\,du=\displaystyle\int\dfrac{1+\cosh(2u)}2\,du,

which is quite simple to evaluate:

\dfrac12u+\dfrac14\sinh(2u)+C.

Now u=\hbox{arcsinh}(x), and

\sinh(2u)=2\sinh(u)\cosh(u)=2x\sqrt{1+x^2}.

Recall from last week that we derived an explicit formula for \hbox{arcsinh}(x), and so our integral finally becomes

\dfrac12\left(\ln(x+\sqrt{1+x^2})+x\sqrt{1+x^2}\right)+C.

You likely noticed that using a hyperbolic substitution is no more complicated than using the circular substitution x=\sin(\theta).  What this means is — no need to ever integrate

\displaystyle\int\tan^m(\theta)\sec^n(\theta)\,d\theta

again!  Frankly, I no longer teach integrals involving \tan(\theta) and \sec(\theta) which involve integration by parts.  Simply put, it is not a good use of time.  I think it is far better to introduce students to hyperbolic trigonometric substitution.

Now let’s take a look at the integral

\displaystyle\int\sqrt{x^2-1}\,dx.

The usual technique?  Substitute x=\sec(\theta), and transform the integral into

\displaystyle\int\tan^2(\theta)\sec(\theta)\,d\theta.

Sigh.  Those irksome tangents and secants.  A messy integration by parts again.

But not so using x=\cosh(u).  We get dx=\sinh(u)\,du and \sqrt{x^2-1}=\sinh(u) (here, a negative square root may be necessary).

We rewrite as

\displaystyle\int\sinh^2(u)\,du=\displaystyle\int\dfrac{\cosh(2u)-1}2\,du.

This results in

\dfrac14\sinh(2u)-\dfrac u2+C=\dfrac12(\sinh(u)\cosh(u)-u)+C.

All we need now is a formula for \hbox{arccosh}(x), which may be found using the same technique we used last week for \hbox{arcsinh}(x):

\hbox{arccosh}(x)=\ln(x+\sqrt{x^2-1}).

Thus, our integral evaluates to

\dfrac12(x\sqrt{x^2-1}-\ln(x+\sqrt{x^2-1}))+C.

We remark that the integral

\displaystyle\int\sqrt{1-x^2}\,dx

is easily evaluated using the substitution x=\sin(\theta).  Thus, integrals of the forms \sqrt{1+x^2}, \sqrt{x^2-1}, and \sqrt{1-x^2} may be computed by using the substitutions x=\sinh(u), x=\cosh(u), and x=\sin(\theta), respectively.  It bears repeating:  no more integrals involving powers of tangents and secants!

One of the neatest applications of hyperbolic trigonometric substitution is using it to find

\displaystyle\int\sec(\theta)\,d\theta

without resorting to a completely unmotivated trick.  Yes, I saved the best for last….

So how do we proceed?  Let’s think by analogy.  Why did the substitution x=\sinh(u) work above?  For the same reason x=\tan(\theta) works: we can simplify \sqrt{1+x^2} using one of the following two identities:

1+\tan^2(\theta)=\sec^2(\theta)\ \hbox{  or  }\ 1+\sinh^2(u)=\cosh^2(u).

So \sinh(u) is playing the role of \tan(\theta), and \cosh(u) is playing the role of \sec(\theta).  What does that suggest?  Try using the substitution \sec(\theta)=\cosh(u)!

No, it’s not the first think you’d think of, but it makes sense.  Comparing the use of circular and hyperbolic trigonometric substitutions, the analogy is fairly straightforward, in my opinion.  There’s much more motivation here than in calculus textbooks.

So with \sec(\theta)=\cosh(u), we have

\sec(\theta)\tan(\theta)\,d\theta=\sinh(u)\,du.

But notice that \tan(\theta)=\sinh(u) — just look at the above identities and compare. We remark that if \theta is restricted to the interval (-\pi/2,\pi/2), then as a result of the asymptotic behavior, the substitution \sec(\theta)=\cosh(u) gives a bijection between the graphs of \sec(\theta) and \cosh(u), and between the graphs of \tan(\theta) and \sinh(u). In this case, the signs are always correct — \tan(\theta) and \sinh(u) always have the same sign.

So this means that

\sec(\theta)\,d\theta=du.

What could be simpler?

Thus, our integral becomes

\displaystyle\int\,du=u+C.

But

u=\hbox{arccosh}(\sec(\theta))=\ln(\sec(\theta)+\tan(\theta)).

Thus,

\displaystyle\int \sec(\theta)\,d\theta=\ln(\sec(\theta)+\tan(\theta))+C.

Voila!

We note that if \theta is restricted to the interval (-\pi/2,\pi/2) as discussed above,  then we always have \sec(\theta)+\tan(\theta)>0, so there is no need to put the argument of the logarithm in absolute values.

Well, I’ve done my best to convince you of the wonder of hyperbolic trigonometric substitutions!  If integrating \sec(\theta) didn’t do it, well, that’s the best I’ve got.

The next installment of hyperbolic trigonometry?  The Gudermannian function!  What’s that, you ask?  You’ll have to wait until next time — or I suppose you can just google it….

Calculus: Hyperbolic Trigonometry, I

love hyperbolic trigonometry.  I always include it when I teach calculus, as I think it is important for students to see.  Why?

  1.  Many applications in the sciences use hyperbolic trigonometry; for example, the use of Laplace transforms in solving differential equations, various applications in physics, modeling population growth (the logistic model is a hyperbolic tangent curve);
  2. Hyperbolic trigonometric substitutions are, in many instances, easier than circular trigonometric substitutions, especially when a substitution involving \tan(x) or \sec(x) is involved;
  3. Students get to see another form of trigonometry, and compare the new form with the old;
  4. Hyperbolic trigonometry is fun.

OK, maybe that last reason is a bit of hyperbole (though not for me).

Not everyone thinks this way.  I once had a colleague who told me she did not teach hyperbolic trigonometry because it wasn’t on the AP exam.  What do you say to someone who says that?  I dunno….

In any case, I want to introduce the subject here for you, and show you some interesting aspects of hyperbolic trigonometry.  I’m going to stray from my habit of not discussing things you can find anywhere online, since in order to get to the better stuff, you need to know the basics.  I’ll move fairly quickly through the introductory concepts, though.

The hyperbolic cosine and sine are defined by

\cosh(x)=\dfrac{e^x+e^{-x}}2,\quad\sinh(x)=\dfrac{e^x-e^{-x}}2,\quad x\in\mathbb{R}.

I will admit that when I introduce this definition, I don’t have an accessible, simple motivation for doing so.  I usually say we’ll learn a lot more as we work with these definitions, so if anyone has a good idea in this regard, I’d be interested to hear it.

The graphs of these curves are shown below.

Day142Hyp1

The graph of \cosh(x) is shown in blue, and the graph of \sinh(x) is shown in red.  The dashed orange graph is y=e^{x}/2, which is easily seen to be asymptotic to both graphs.

Parallels to the circular trigonometric functions are already apparent:  y=\cosh(x) is an even function, just like y=\cos(x).  Similarly, \sinh(x) is odd, just like \sin(x).

Another parallel which is only slight less apparent is the fundamental relationship

\cosh^2(x)-\sinh^2(x)=1.

Thus, (\cosh(x),\sinh(x)) lies on a unit hyperbola, much like (\cos(x),\sin(x)) lies on a unit circle.

While there isn’t a simple parallel with circular trigonometry, there is an interesting way to characterize \cosh(x) and \sinh(x).  Recall that given any function f(x), we may define

E(x)=\dfrac{f(x)+f(-x)}2,\quad O(x)=\dfrac{f(x)-f(-x)}2

to be the even and odd parts of f(x), respectively.  So we might simply say that \cosh(x) and \sinh(x) are the even and odd parts of e^x, respectively.

There are also many properties of the hyperbolic trigonometric functions which are reminiscent of their circular counterparts.  For example, we have

\sinh(2x)=2\sinh(x)\cosh(x)

and

\sinh(x+y)=\sinh(x)\cosh(y)+\sinh(y)\cosh(x).

None of these are especially difficult to prove using the definitions.  It turns out that while there are many similarities, there are subtle differences.  For example,

\cosh(x+y)=\cosh(x)\cosh(y)+\sinh(x)\sinh(y).

That is, while some circular trigonometric formulas become hyperbolic just by changing \cos(x) to \cosh(x) and \sin(x) to \sinh(x), sometimes changes of sign are necessary.

These changes of sign from circular formulas are typical when working with hyperbolic trigonometry.  One particularly interesting place the change of sign arises is when considering differential equations, although given that I’m bringing hyperbolic trigonometry into a calculus class, I don’t emphasize this relationship.  But recall that \cos(x) is the unique solution to the differential equation

y''+y=0,\quad y(0)=1,\quad y'(0)=0.

Similarly, we see that \cosh(x) is the unique solution to the differential equation

y''-y=0,\quad y(0)=1,\quad y'(0)=0.

Again, the parallel is striking, and the difference subtle.

Of course it is straightforward to see from the definitions that (\cosh(x))'=\sinh(x) and (\sinh(x))'=\cosh(x).  Gone are the days of remembering signs when differentiating and integrating trigonometric functions!  This is one feature of hyperbolic trigonometric functions which students always appreciate….

Another nice feature is how well-behaved the hyperbolic tangent is (as opposed to needing to consider vertical asymptotes in the case of \tan(x)).  Below is the graph of y=\tanh(x)=\sinh(x)/\cosh(x).

Day142Hyp2

The horizontal asymptotes are easily calculated from the definitions.  This looks suspiciously like the curves obtained when modeling logistic growth in populations; that is, finding solutions to

\dfrac{dP}{dt}=kP(C-P).

In fact, these logistic curves are hyperbolic tangents, which we will address in more detail in a later post.

One of the most interesting things about hyperbolic trigonometric functions is that their inverses have closed formulas — in striking contrast to their circular counterparts.  I usually have students work this out, either in class or as homework; the derivation is quite nice, so I’ll outline it here.

So let’s consider solving the equation x=\sinh(y) for y.  Begin with the definition:

x=\dfrac{e^y-e^{-y}}2.

The critical observation is that this is actually a quadratic in e^y:

(e^y)^2-2xe^y-1=0.

All that is necessary is to solve this quadratic equation to yield

e^y=x\pm\sqrt{1+x^2},

and note that x-\sqrt{1+x^2} is always negative, so that we must choose the positive sign.  Thus,

y=\hbox{arcsinh}(x)=\ln(x+\sqrt{1+x^2}).

And this is just the beginning!  At this stage, I also offer more thought-provoking questions like, “Which is larger, \cosh(\ln(42)) or \ln(\cosh(42))?  These get students working with the definitions and thinking about asymptotic behavior.

Next week, I’ll go into more depth about the calculus of hyperbolic trigonometric functions.  Stay tuned!

Calculus: The Geometry of Polynomials, II

The original post on The Geometry of Polynomials generated rather more interest that usual.  One reader, William Meisel, commented that he wondered if something similar worked for curves like the Folium of Descartes, given by the equation

x^3+y^3=3xy,

and whose graph looks like:

Day141Folium1

I replied that yes, I had success, and what I found out would make a nice follow-up post rather than just a reply to his comment.  So let’s go!

Just a brief refresher:  if, for example, we wanted to describe the behavior of y=2(x-4)(x-1)^2 where it crosses the x-axis at x=1, we simply retain the (x-1)^2 term and substitute the root x=1 into the other terms, getting

y=2(1-4)(x-1)^2=-6(x-1)^2

as the best-fitting parabola at x=1.

Now consider another way to think about this:

\displaystyle\lim_{x\to1}\dfrac y{(x-1)^2}=-6.

For examples like the polynomial above, this limit is always trivial, and is essentially a simple substitution.

What happens when we try to evaluate a similar limit with the Folium of Descartes?  It seems that a good approximation to this curve at x=0 (the U-shaped piece, since the sideways U-shaped piece involves writing x as a function of y) is y=x^2/3, as shown below.

Day141Folium2

To see this, we need to find

\displaystyle\lim_{x\to0}\dfrac y{x^2}.

After a little trial and error, I found it was simplest to use the substitution z=y/x^2, and so rewrite the equation for the Folium of Descartes by using the substitution y=x^2z, which results in

1+x^3z^3=3z.

Now it is easy to see that as x\to0, we have z\to1/3, giving us a good quadratic approximation at the origin.

Success!  So I thought I’d try some more examples, and see how they worked out.  I first just changed the exponent of x, looking at the curve

x^n+y^3=3xy,

shown below when n=6.

Day141Folium3.png

What would be a best approximation near the origin?  You can almost eyeball a fifth-degree approximation here, but let’s assume we don’t know the appropriate power and make the substitution y=x^kz, with k yet to be determined. This results in

x^{3k-n}z^3+1=3zx^{k+1-n}.

Now observe that when k=n-1, we have

x^{2n-3}z^3+1=3z,

so that \displaystyle\lim_{x\to0}z=1/3. Thus, in our case with n=6, we see that y=x^5/3 is a good approximation to the curve near the origin.  The graph below shows just how good an approximation it is.

Day141Folium4.png

OK, I thought to myself, maybe I just got lucky.  Maybe introduce a change which will really alter the nature of the curve, such as

x^3+y^3=3xy+1,

whose graph is shown below.

Day141Folium5

Here, the curve passes through the x-axis at x=1, with what appears to be a linear pass-through.  This suggests, given our previous work, the substitution y=(x-1)z, which results in

x^3+(x-1)^3z^3=3x(x-1)z+1.

We don’t have much luck with \displaystyle\lim_{x\to1}z here.  But if we move the 1 to the other side and factor, we get

(x-1)(x^2+x+1)+(x-1)^3z^3=3x(x-1)z.

Nice!  Just divide through by x-1 to obtain

x^2+x+1+(x-1)^2z=3xz.

Now a simple calculation reveals that \displaystyle\lim_{x\to1}z=1. And sure enough, the line y=x-1 does the trick:

Day141Folium6

Then I decided to change the exponent again by considering

x^n+y^3=3xy+1.

Here is the graph of the curve when n=6:

Day141Folium7

It seems we have two roots this time, with linear pass-throughs.  Let’s try the same idea again, making the substitution y=(x-1)z, moving the 1 over, factoring, and dividing through by x-1.  This results in

x^{n-1}+x^{n-2}+\cdots+1+(x-1)^2z^3=3xz.

It is not difficult to calculate that \displaystyle\lim_{x\to1}z=n/3.

Now things become a bit more interesting when n is even, since there is always a root at x=-1 in this case.  Here, we make the substitution y=(x+1)z, move the 1 over, and divide by x+1, resulting in

\dfrac{x^n-1}{x+1}+(x+1)^2z^3=3xz.

But since n is even, then x^2-1 is a factor of x^n-1, so we have

(x-1)(x^{n-2}+x^{n-4}+\cdots+x^2+1)+(x+1)^2z^3=3xz.

Substituting x=-1 in this equation gives

-2\left(\dfrac n2\right)=3(-1)z,

which immediately gives  \displaystyle\lim_{x\to1}z=n/3 as well!  This is a curious coincidence, for which I have no nice geometrical explanation.  The case when n=6 is illustrated below.

Day141Folium8

This is where I stopped — but I was truly surprised that everything I tried actually worked.  I did a cursory online search for Taylor series of implicitly defined functions, but this seems to be much less popular than series for y=f(x).

Anyone more familiar with this topic care to chime in?  I really enjoyed this brief exploration, and I’m grateful that William Meisel asked his question about the Folium of Descartes.  These are certainly instances of a larger phenomenon, but I feel the statement and proof of any theorem will be somewhat more complicated than the analogous results for explicitly defined functions.

And if you find some neat examples, post a comment!  I’d enjoy writing another follow-up post if there is continued interested in this topic.

Calculus: Linear Approximations, II

As I mentioned last week, I am a fan of emphasizing the idea of a derivative as a linear approximation.  I ended that discussion by using this method to find the derivative of \tan(x).   Today, we’ll look at some more examples, and then derive the product, quotient and chain rules.

Differentiating \sec(x) is particularly nice using this method.  We first approximate

\sec(x+h)=\dfrac1{\cos(x+h)}\approx\dfrac1{\cos(x)-h\sin(x)}.

Then we factor out a \cos(x) from the denominator, giving

\sec(x+h)\approx\dfrac1{\cos(x)(1-h\tan(x))}.

As we did at the end of last week’s post, we can make h as small as we like, and so approximate by considering 1/(1-h\tan(x)) as the sum of an infinite series:

\dfrac1{1-h\tan(x)}\approx1+h\tan(x).

Finally, we have

\sec(x+h)\approx\dfrac{1+h\tan(x)}{\cos(x)}=\sec(x)+h\sec(x)\tan(x),

which gives the derivative of \sec(x) as \sec(x)\tan(x).

We’ll look at one more example involving approximating with geometric series before moving on to the product, quotient, and chain rules.  Consider differentiating x^{-n}. We first factor the denominator:

\dfrac1{(x+h)^n}=\dfrac1{x^n(1+h/x)^n}.

Now approximate

\dfrac1{1+h/x}\approx1-\dfrac hx,

so that, to first order,

\dfrac1{(1+h/x)^n}\approx \left(1-\dfrac hx\right)^{\!\!n}\approx 1-\dfrac{nh}x.

This finally results in

\dfrac1{(x+h)^n}\approx \dfrac1{x^n}\left(1-\dfrac{nh}x\right)=\dfrac1{x^n}+h\dfrac{-n}{x^{n+1}},

giving us the correct derivative.

Now let’s move on to the product rule:

(fg)'(x)=f(x)g'(x)+f'(x)g(x).

Here, and for the rest of this discussion, we assume that all functions have the necessary differentiability.

We want to approximate f(x+h)g(x+h), so we replace each factor with its linear approximation:

f(x+h)g(x+h)\approx (f(x)+hf'(x))(g(x)+hg'(x)).

Now expand and keep only the first-order terms:

f(x+h)g(x+h)\approx f(x)g(x)+h(f(x)g'(x)+f'(x)g(x)).

And there’s the product rule — just read off the coefficient of h.

There is a compelling reason to use this method.  The traditional proof begins by evaluating

\displaystyle\lim_{h\to0}\dfrac{f(x+h)g(x+h)-f(x)g(x)}h.

The next step?  Just add and subtract f(x)g(x+h) (or perhaps f(x+h)g(x)).  I have found that there is just no way to convincingly motivate this step.  Yes, those of us who have seen it crop up in various forms know to try such tricks, but the typical first-time student of calculus is mystified by that mysterious step.  Using linear approximations, there is absolutely no mystery at all.

The quotient rule is next:

\left(\dfrac fg\right)^{\!\!\!'}\!(x)=\dfrac{g(x)f'(x)-f(x)g'(x)}{g(x)^2}.

First approximate

\dfrac{f(x+h)}{g(x+h)}\approx\dfrac{f(x)+hf'(x)}{g(x)+hg'(x)}.

Now since h is small, we approximate

\dfrac1{g(x)+hg'(x)}\approx\dfrac1{g(x)}\left(1-h\dfrac{g'(x)}{g(x)}\right),

so that

\dfrac{f(x+h)}{g(x+h)}\approx(f(x)+hf'(x))\cdot\dfrac1{g(x)}\left(1-h\dfrac{g'(x)}{g(x)}\right).

Multiplying out and keeping just the first-order terms results in

\dfrac{f(x+h)}{g(x+h)}\approx f(x)g(x)+h\dfrac{g(x)f'(x)-f(x)g'(x)}{g(x)^2}.

Voila!  The quotient rule.  Now usual proofs involve (1) using the product rule with f(x) and 1/g(x), but note that this involves using the chain rule to differentiate 1/g(x);  or (2) the mysterious “adding and subtracting the same expression” in the numerator.  Using linear approximations avoids both.

The chain rule is almost ridiculously easy to prove using linear approximations.  Begin by approximating

f(g(x+h))\approx f(g(x)+hg'(x)).

Note that we’re replacing the argument to a function with its linear approximation, but since we assume that f is differentiable, it is also continuous, so this poses no real problem.  Yes, perhaps there is a little hand-waving here, but in my opinion, no rigor is really lost.

Since g is differentiable, then g'(x) exists, and so we can make hg'(x) as small as we like, so the “hg'(x)” term acts like the “h” term in our linear approximation.  Additionally, the “g(x)” term acts like the “x” term, resulting in

f(g(x+h)\approx f(g(x))+hg'(x)f'(g(x)).

Reading off the coefficient of h gives the chain rule:

(f\circ g)'(x)=f'(g(x))g'(x).

So I’ve said my piece.  By this time, you’re either convinced that using linear approximations is a good idea, or you’re not.  But I think these methods reflect more accurately the intuition behind the calculations — and reflect what mathematicians do in practice.

In addition, using linear approximations involves more than just mechanically applying formulas.  If all you ever do is apply the product, quotient, and chain rules, it’s just mechanics.  Using linear approximations requires a bit more understanding of what’s really going on underneath the hood, as it were.

If you find more neat examples of differentiation using this method, please comment!  I know I’d be interested, and I’m sure others would as well.

In my next installment (or two or three) in this calculus series, I’ll talk about one of my favorite topics — hyperbolic trigonometry.

Calculus: Linear Approximations, I

Last week’s post on the Geometry of Polynomials generated a lot of interest from folks who are interested in or teach calculus.  So I thought I’d start a thread about other ideas related to teaching calculus.

This idea is certainly not new.  But I think it is sorely underexploited in the calculus classroom.  I like it because it reinforces the idea of derivative as linear approximation.

The main idea is to rewrite

\displaystyle\lim_{h\to 0}\dfrac{f(x+h)-f(x)}h=f'(x)

as

f(x+h)\approx f(x)+hf'(x),

with the note that this approximation is valid when h\approx0.  Writing the limit in this way, we see that f(x+h), as a function of h, is linear in h in the sense of the limit in the definition actually existing — meaning there is a good linear approximation to f at x.

Moreover, in this sense, if

f(x+h)\approx f(x)+hg(x),

then it must be the case that f'(x)=g(x).  This is not difficult to prove.

Let’s look at a simple example, like finding the derivative of f(x)=x^2.  It’s easy to see that

f(x+h)=(x+h)^2=x^2+h(2x)+h^2.

So it’s easy to read off the derivative: ignore higher-order terms in h, and then look at the coefficient of h as a function of x.

Note that this is perfectly rigorous.  It should be clear that ignoring higher-order terms in h is fine since when taking the limit as in the definition, only one h divides out, meaning those terms contribute 0 to the limit.  So the coefficient of h will be the only term to survive the limit process.

Also note that this is nothing more than a rearrangement of the algebra necessary to compute the derivative using the usual definition.  I just find it is more intuitive, and less cumbersome notationally.  But every step taken can be justified rigorously.

Moreover, this method is the one commonly used in more advanced mathematics, where  functions take vectors as input.  So if

f({\bf v})={\bf v}\cdot{\bf v},

we compute

f({\bf u}+h{\bf v})={\bf u}\cdot{\bf u}+2h{\bf u}\cdot{\bf v}+h^2{\bf v}\cdot{\bf v},

and read off

\nabla_{\bf v}f({\bf u})=2{\bf u}\cdot{\bf v}.

I don’t want to go into more details here, since such calculations don’t occur in beginning calculus courses.  I just want to point out that this way of computing derivatives is in fact a natural one, but one which you don’t usually encounter until graduate-level courses.

Let’s take a look at another example:  the derivative of f(x)=\sin(x), and see how it looks using this rewrite.  We first write

\sin(x+h)=\sin(x)\cos(h)+\cos(x)\sin(h).

Now replace all functions of h with their linear approximations.  Since \cos(h)\approx1 and \sin(h)\approx h near h=0, we have

\sin(x+h)\approx\sin(x)+h\cos(x).

This immediately gives that \cos(x) is the derivative of \sin(x).

Now the approximation \cos(h)\approx1 is easy to justify geometrically by looking at the graph of \cos(x).  But how do we justify the approximation \sin(h)\approx h?

Of course there is no getting around this.  The limit

\displaystyle\lim_{h\to0}\dfrac{\sin(h)}h

is the one difficult calculation in computing the derivative of \sin(x).  So then you’ve got to provide your favorite proof of this limit, and then move on.  But this approximation helps to illustrate the essential point:  the differentiability of \sin(x) at x=0 does, in a real sense, imply the differentiability of \sin(x) everywhere else.

So computing derivatives in this way doesn’t save any of the hard work, but I think it makes the work a bit more transparent.  And as we continually replace functions of h with their linear approximations, this aspect of the derivative is regularly being emphasized.

How would we use this technique to differentiate f(x)=\sqrt x?  We need

\sqrt{x+h}\approx\sqrt x+hf'(x),

and so

x+h\approx \left(\sqrt x+hf'(x)\right)^2\approx x+2h\sqrt xf'(x).

Since the coefficient of h on the left is 1, so must be the coefficient on the right, so that

2\sqrt xf'(x)=1.

As a last example for this week, consider taking the derivative of f(x)=\tan(x).  Then we have

\tan(x+h)=\dfrac{\tan(x)+\tan(h)}{1-\tan(x)\tan(h)}.

Now since \sin(h)\approx h and \cos(h)\approx 1, we have \tan(h)\approx h, and so we can replace to get

\tan(x+h)\approx\dfrac{\tan(x)+h}{1-h\tan(x)}.

Now what do we do?  Since we’re considering h near 0, then h\tan(x) is small (as small as we like), and so we can consider

\dfrac1{1-h\tan(x)}

as the sum of the infinite geometric series

\dfrac1{1-h\tan(x)}=1+h\tan(x)+h^2\tan^2(x)+\cdots

Replacing, with the linear approximation to this sum, we get

\tan(x+h)\approx(\tan(x)+h)(1+h\tan(x)),

and so

\tan(x+h)\approx\tan(x)+h(1+\tan^2(x)).

This give the derivative of \tan(x) to be

1+\tan^2(x)=\sec^2(x).

Neat!

Now this method takes a bit more work than just using the quotient rule (as usually done).  But using the quotient rule is a purely mechanical process; this way, we are constantly thinking, “How do I replace this expression with a good linear approximation?”  Perhaps more is learned this way?

There are more interesting examples using this geometric series idea.  We’ll look at a few more next time, and then use this idea to prove the product, quotient, and chain rules.  Until then!

The Geometry of Polynomials

I recently needed to make a short demo lecture, and I thought I’d share it with you.  I’m sure I’m not the first one to notice this, but I hadn’t seen it before and I thought it was an interesting way to look at the behavior of polynomials where they cross the x-axis.

The idea is to give a geometrical meaning to an algebraic procedure:  factoring polynomials.  What is the geometry of the different factors of a polynomial?

Let’s look at an example in some detail:  f(x)=2(x-4)(x-1)^2.poly0b

Now let’s start looking at the behavior near the roots of this polynomial.

poly0c

Near x=1, the graph of the cubic looks like a parabola — and that may not be so surprising given that the factor (x-1) occurs quadratically.

poly0d

And near x=4, the graph passes through the x-axis like a line — and we see a linear factor of (x-4) in our polynomial.

But which parabola, and which line?  It’s actually pretty easy to figure out.  Here is an annotated slide which illustrates the idea.

Day137poly1

All you need to do is set aside the quadratic factor of (x-1)^2, and substitute the root, x=1, in the remaining terms of the polynomial, then simplify.  In this example, we see that the cubic behaves like the parabola y=-6(x-1)^2 near the root x=1. Note the scales on the axes; if they were the same, the parabola would have appeared much narrower.

We perform a similar calculation at the root x=4.

Day137poly2

Just isolate the linear factor (x-4), substitute x=4 in the remaining terms of the polynomial, and then simplify.  Thus, the line y=18(x-4) best describes the behavior of the graph of the polynomial as it passes through the x-axis.  Again, note the scale on the axes.

We can actually use this idea to help us sketch graphs of polynomials when they’re in factored form.  Consider the polynomial f(x)=x(x+1)^2(x-2)^3.  Begin by sketching the three approximations near the roots of the polynomial.  This slide also shows the calculation for the cubic approximation.

Day137poly3.png

Now you can begin sketching the graph, starting from the left, being careful to closely follow the parabola as you bounce off the x-axis at x=-1.

poly1d

Continue, following the red line as you pass through the origin, and then the cubic as you pass through x=2.  Of course you’d need to plot a few points to know just where to start and end; this just shows how you would use the approximations near the roots to help you sketch a graph of a polynomial.

poly1f

Why does this work?  It is not difficult to see, but here we need a little calculus.  Let’s look, in general, at the behavior of f(x)=p(x)(x-a)^n near the root x=a.  Given what we’ve just been observing, we’d guess that the best approximation near x=a would just be y=p(a)(x-a)^n.

Just what does “best approximation” mean?  One way to think about approximating, calculuswise, is matching derivatives — just think of Maclaurin or Taylor series.  My claim is that the first n derivatives of f(x)=p(x)(x-a)^n and y=p(a)(x-a)^n match at x=a.

First, observe that the first n-1 derivatives of both of these functions at x=a must be 0.  This is because (x-a) will always be a factor — since at most n-1 derivatives are taken, there is no way for the (x-a)^n term to completely “disappear.”

But what happens when the nth derivative is taken?  Clearly, the nth derivative of p(a)(x-a)^n at x=a is just n!p(a).  What about the nth derivative of f(x)=p(x)(x-a)^n?

Thinking about the product rule in general, we see that the form of the nth derivative must be f^{(n)}(x)=n!p(x)+ (x-a)(\text{terms involving derivatives of } p(x)). When a derivative of p(x) is taken, that means one factor of (x-a) survives.

So when we take f^{(n)}(a), we also get n!p(a).  This makes the nth derivatives match as well.  And since the first n derivatives of p(x)(x-a)^n and p(a)(x-a)^n match, we see that p(a)(x-a)^n is the best nth degree approximation near the root x=a.

I might call this observation the geometry of polynomials. Well, perhaps not the entire geometry of polynomials….  But I find that any time algebra can be illustrated graphically, students’ understanding gets just a little deeper.

Those who have been reading my blog for a while will be unsurprised at my geometrical approach to algebra (or my geometrical approach to anything, for that matter).  Of course a lot of algebra was invented just to describe geometry — take the Cartesian coordinate plane, for instance.  So it’s time for algebra to reclaim its geometrical heritage.  I shall continue to be part of this important endeavor, for however long it takes….

The Problem with Calculus Textbooks

Simply put, most calculus textbooks are written in the wrong order.

Unfortunately, this includes the most popular textbooks used in colleges and universities today.

This problem has a long history, and will not be quickly solved for a variety of reasons. I think the solution lies ultimately with high quality, open source e-modules (that is, stand-alone tutorials on all calculus-related topics), but that discussion is for another time. Today, I want to address a more pressing issue: since many of us (including myself) must teach from such textbooks — now, long before the publishing revolution — how might we provide students a more engaging, productive calculus experience?

To be specific, I’ll describe some strategies I’ve used in calculus over the past several years. Once you get the idea, you’ll be able to look through your syllabus and find ways to make similar adaptations. There are so many different versions of calculus taught, there is no “one size fits all” solution. So here goes.

1. I now teach differentiation before limits. The reason is that very little intuition about limits is needed to differentiate quadratics, for example — but the idea of limits is naturally introduced in terms of slopes of secant lines. Once students have the general idea, I give them a list of the usual functions to differentiate. Now they generate the limits we need to study — completely opposite of introducing various limits out of context that “they will need later.”

Students routinely ask, “When am I ever going to use this?” At one time, I dismissed the question as irrelevant — surely students should know that the learning process is not one of immediate gratification. But when I really understood what they were asking — “How do I make sense of what you’re telling me when I have nothing to relate it to except the promise of some unknown future problem?” — I started to rethink how I presented concepts in calculus.

I also didn’t want to write my own calculus textbook from scratch — so I looked for ways to use the resources I already had. Simply doing the introductory section on differentiation before the chapter on limits takes no additional time in the classroom, and not much preparation on the part of the teacher. This point is crucial for the typical teacher — time is precious. What I’m advocating is just a reshuffling of the topics we (have to) teach anyway.

2. I no longer teach the chapter on techniques of integration as a “chapter.” In the typical textbook, nothing in this chapter is sufficiently motivated. So here’s what I do.

I teach the section on integration by parts when I discuss volumes. Finding volumes using cylindrical shells naturally gives rise to using integration by parts, so why wait? Incidentally, I also bring center of mass and Pappus’ theorem into play, as they also fit naturally here. The one-variable formulation of the center of mass gives rise to squares of functions, so I introduce integrating powers of trigonometric functions here. (Though I omit topics such as using integration by parts to integrate unfriendly powers of tangent and secant — I do not feel this is necessary given any mathematician I know would jump to Mathematica or similar software to evaluate such integrals.)

I teach trigonometric substitution (hyperbolic as well — that for another blog post) when I cover arc length and surface area — again, since integrals involving square roots arise naturally here.

Partial fractions can either be introduced when covering telescoping series, or when solving the logistic equation. (A colleague recommended doing series in the middle of the course rather then the end (where it would have naturally have fallen given the order of chapters in our text), since she found that students’ minds were fresher then — so I introduced partial fractions when doing telescoping series. I found this rearrangement to be a good suggestion, by the way. Thanks, Cornelia!)

3. I no longer begin Taylor series by introducing sequences and series in the conventional way. First, I motivate the idea by considering limits like

\displaystyle\lim_{x\to0}\dfrac{\sin x-x}{x^3}=-\dfrac16.

This essentially means that near 0, we can approximate \sin(x) by the cubic polynomial

\sin(x)\approx x-\dfrac{x^3}6.

In other words, the limits we often encounter while studying L’Hopital’s rule provide a good motivation for polynomial approximations. Once the idea is introduced, higher-order — eventually “infinite-order” — approximations can be brought in. Some algorithms approximate transcendental functions with polynomials — this provides food for thought as well. Natural questions arise: How far do we need to go to get a given desired accuracy? Will the process always work?

I won’t say more about this approach here, since I’ve written up a complete set of Taylor series notes.  They were written for an Honors-level class, so some sections won’t be appropriate for a typical calculus course. They were also intended for use in an inquiry-based learning environment, and so are not in the usual “text, examples, exercise” order. But I hope they at least convey an approach to the subject, which I have adapted to a more traditional university setting as well. For the interested instructor, I also have compiled a complete Solutions Manual.

I think this is enough to give you the idea of my approach to using a traditional textbook. Every calculus teacher has their own way of thinking about the subject — as it should be. There is no reason to think that every teacher should teach calculus in the same way — but there is every reason to think that calculus teachers should be contemplating how to make this beautiful subject more accessible to their students.

Continue reading The Problem with Calculus Textbooks