The question is how large should be the support size for such a $P'$, and in the case of additive error this is closely related to the VC-dimension of $F$, denoted $d(F)$; size $s=\tildeO(1/\epsilon^2)\cdot d(F)$ suffices (for additive approximation of $\epsilon$). In fact, $P'$ may be uniform over a set of size $s$, and a sample of $s$ elements selected according to $P$ will do (with high probability over the selection of this sample).
The current work identifies a parameter, denoted $r(F)$, which is closely related to the support size that is required for a multiplicative error approximation (by factor of $1\pm\epsilon$): size $\tildeO(1/\epsilon^2)\cdot r(F)^2$ suffices, whereas $r(F)$ is a definite lower bound. (While $r(F) \geq d(F)$, an example is shown where $d(F)=1$ and $r(F)=|U|$.)
It seems that the *exact* dependence of the sample size (i.e., support size) on the proximity parameter (i.e., $\epsilon$) is not known also in the case of additive error, but in the current work also the relation to the "size" parameter (i.e., $r(F)$) is not known.
It is well known that if a family $ F$ of sets over a universe $U$ has a $VC$-dimension $d$, then for every probability distribution $P$ over $U$ there exists a sample of $U$ of size $d \cdot {\rm poly}(1/\epsilon)$ that faithfully represents $P$ on $F$ up to an $\epsilon$ additive error.
We ask whether there exists a property of $F$ analogous to $VC$-dimension which would ensure the existence of a small sample faithfully representing $P$ on $F$ up to $(1+ \epsilon)$ {\bf multiplicative} error. The answer turns out to be positive, and the key parameter is the {\bf triangular rank} of $F$. In particular, we show that if the triangular rank of $F$ is $r$, then there exists a sample of size $\min\{ r \log|F|, r^2\} \cdot {\rm poly}(1/\epsilon)$ that faithfully represents $P$ on $F$ up to $(1+ \epsilon)$ multiplicative factor.
To demonstrate the usefulness of such approximations, we consider the well-studied problem of dimension-reduction for $L_1$ metrics, and show that any $L_1$ metric on $n$ points can be $ (1+ \epsilon)$ approximated by a sum of $n \cdot {\rm poly}(1/\epsilon)$ cut-metrics.