Solving the gene duplication problem is a classical approach for species tree inference from gene trees that are confounded by gene duplications. This problem takes a collection of gene trees and seeks a species tree that implies the minimum number of gene duplications. Wilkinson et al. posed the conjecture that this problem satisfies the desirable Pareto property in 2007. That is, for every instance of the problem, each cluster that is present in all of the input gene trees of this instance, called a consensus cluster, will also be found in every possible solution to this instance. We show that this conjecture does not generally hold. However, we prove that for every instance of the gene duplication problem there is always at least one solution that contains the consensus clusters of the input gene trees. Based on the construction of our proof, we introduce an efficient algorithm that transforms a given species tree into one that implies equal or less duplications, and, in addition, contains all of the consensus clusters. Finally, we demonstrate the performance of our algorithm using simulation studies.
展开▼