Being such a trivial problem for a RAM model, one may miss that there is something to say here about Boolean circuits. Indeed, this problem seems to capture the fundamental difference between free dynamic access (indirect addressing) and a static access.
It is a pity the authors don't say a bit more about the prior upper bounds. Even the linear upper bound for the case of $q=1$ requires some thinking. To select the $i^\th$ bit of $x$, write $i=(b,j)$, where $b$ is the most significant bit of $i$, and $x=(x',x'')$, where $|x'|=|x''|$, and combine the solutions to $(j,x')$ and $(j,x'')$ using the bit $b$.
As for an almost-linear upper bound for general $q$, one may reduce to the compaction problem reviewed in Sec 1.1. Here the input has the form $x=(x_1,....,x_n)$ and $b_1,...,b_n$ and one has to outputs the $x_i$'s for which $b_i=1$. In the case of distinct $i_j$'s for the selection problem, the reduction is based on sorting the sequence $(i_1,1),...,(i_q,1),(1,0),....,(n,0)$, and using the result to produce the marking $b=(b_1,....,b_n)$. Replacing $x$ by $x'=((1,x_1),...,(n,x_n))$ and appying compaction to $x'$ and $b$, allows for producing the desired output.
However, the current work aims at an upper bound that has the form of the lower bound (i.e., $n+q\poly(\log n))$), which is truly linear as long as $q\leq n/\poly(\log n)$. It achieves this goal by using strong routing networks; specifically, superconcentrators. Issues to be addressed include the generation of the routing instructions (for the swirchges in the routing network), and handling of non-distinct elements.
We study the circuit complexity of the multiselection problem: given an input string $x \in \{0,1\}^n$ along with indices $i_1,\dots,i_q \in [n]$, output $(x_{i_1},\dots,x_{i_q})$. A trivial lower bound for the circuit size is the input length $n + q \cdot \log(n)$, but the straightforward construction has size $\Theta(q \cdot n)$.
Our main result is an $O(n+q \cdot \log^3(n))$-size and $O(\log(n+q))$ depth circuit for multiselection. In particular, for any $q \leq n/\log^3(n)$ the circuit has linear size and logarithmic depth. Prior to our work no linear-size circuit for multiselection was known for any $q=\omega(1)$ and regardless of depth.
See ECCC TR23-113.