A tentative first step was taken by Lajaaiti et al. (2023), who combined a graph-based representation of phylogenetic trees with Graphical Neural Networks (GNN). This combination, however, resulted in a poor performance due to “over-smoothing” and “hop neighbourhood” problems.
But see Leroy et al., 2025 (https://doi.org/10.1101/2025.08.14.670341) that addresses this issue using an improved pooling operator. However, as they discuss in their preprint, the performance they achieve (exceeding that of the MLE) still likely does not represent a ceiling on their performance here, as the architecture is quite simple. Use of more sophisticated graph-based architectures including graph transformers (which combat oversmoothing and can more readily account for both local and global patterns) will likely increase this performance further.