The proof is indeed simple, not any harder than the proof that assumes that the sparse set is a subset of a set in P (see Exercise 3.12 in my computational complexity book).
The first idea is to scan the downwards self-reducibility tree of SAT and extend at most polynomially many viable vertices at each level of the tree, where this polynomial is determined by the polynomials of the sparseness bound and the stretch of the Karp-reduction (of SAT to the sparse set). The crucial idea is the following: Rather than testing the the residual formulae at the current level by applying the Karp-reduction to each of them, one applies the Karp-reduction to the disjunction of the first formula (i.e., $\phi_1$) and each of the remaining formulae (i.e, the $\phi_i$'s for $i>1$.
The point is that if all reduction values are distinct, then one of these values is not in the sparse set, and it follows that the first formula is not in SAT, and so the first formula can be discarded. Otherewise, if the reduction maps $\phi_1\vee\phi_i$ and $\phi_1\vee\phi_i$ to the same value, then we can discard (say) $\phi_i$, since either $\phi_1$ is in SAT (and remains viable) or $\phi_1$ is not in SAT and so $\phi_i$ and $\phi_j$ have the same SAT-value (and so it suffices to keep either of them).
Mahaney's Theorem states that, assuming P neq NP, no NP-hard set can have a polynomially bounded number of yes-instances at each input length. We give an exposition of a very simple unpublished proof of Manindra Agrawal whose ideas appear in Agrawal-Arvind (Geometric sets of low information content, Theoret. Comp. Sci., 1996). This proof is so simple that it can easily be taught to undergraduates or a general graduate CS audience - not just theorists! - in about 10 minutes, which the author has done successfully several times. We also include applications of Mahaney's Theorem to fundamental questions that bright undergraduates would ask which could be used to fill the remaining hour of a lecture, as well as an application (due to Ikenmeyer, Mulmuley, and Walter, arXiv:1507.02955) to the representation theory of the symmetric group and the Geometric Complexity Theory Program. To this author, the fact that sparsity results on NP-complete sets have an application to classical questions in representation theory says that they are not only a gem of classical theoretical computer science, but indeed a gem of mathematics.
See ECCC TR16-162.