When do GFlowNets learn the right distribution?
Diego Mesquita (FGV EMAp)
Generative Flow Networks (GFlowNets) are an emerging class of sampling methods for distributi- ons over discrete and compositional objects, e.g., graphs. In spite of their
remarkable success in problems such as drug discovery and phylogenetic inference, the question of when and whether GFlowNets learn to sample from the target distribution
remains underexplored. To tackle this issue, we first assess the extent to which a violation of the detailed balance of the underlying flow network might hamper the correctness
of GFlowNet’s sampling distribution. In particular, we demonstrate that the impact of an imbalanced edge on the model’s accuracy is influenced by the total amount of flow passing
through it and, as a consequence, is unevenly distributed across the network. We also argue that, depending on the parameterization, imbalance may be inevitable. In this regard,
we consider the problem of sampling from distributions over graphs with GFlowNets parameterized by graph neural networks (GNNs) and show that the representation limits of GNNs
delineate which distributions these GFlowNets can approximate. Lastly, we address these limitations by proposing a theoretically sound and computationally tractable metric for
assessing GFlowNets, experimentally showing it is a better proxy for correctness than popular evaluation protocols.