

The landscape of antibody clonal similarities is vast and complex for example, on the a.a. Furthermore, as of yet, only networks expressing clonal similarity relations of one nucleotide (nt) or one amino acid (a.a.) between sequences have been investigated 7, 8, 9, 10, 11, 12, which, considering recently discovered biases in VDJ recombination and SHM targeting 15, 16, 17, 18, 19, 20, 21, may not be sufficient for a comprehensive immunological appreciation of repertoire architecture.Ī platform for large-scale networks of antibody repertoires Currently, computational methods for constructing large-scale networks with more than 10 3 nodes are not typically accessible in systems biology 14. Indeed, it has been shown that the natural antibody repertoire exceeds the informative visualization threshold (hundreds of clonal nodes) by at least three orders of magnitude 13, a limit that previous research did not explore given the lower biological coverage. identity sequences) thereby preventing the quantitative description of immune repertoire architecture. Network visualization limits the informative graphical display of a network to a few hundred antibody clones (100% a.a. Thus far, network analysis has mostly been utilized for visualization of network clusters 7, 8, 9, 10, 11, 12. Network connectivity was later also used to discriminate between diverse repertoires of healthy individuals and clonally expanded repertoires from individuals with diseases such as chronic lymphocytic leukemia 7 and HIV-1 infection 10. Sequence-based networks have first been used to show immune responses defined by similarity between clones, a proxy for clonal expansion 8. Network analysis captures antibody repertoire architecture by representing the similarity landscape of antibody sequences as nodes (antibody clonal sequence) that are connected if sufficiently similar 7, 8, 9, 10, 11, 12 (Fig. Recently, selected aspects of network analysis have been employed to investigate antibody repertoire architecture in health and disease. However, due to limitations in technological sequencing depth and algorithmic advances, the fundamental construction principles of antibody repertoire architecture have remained largely unknown, thereby hindering a more profound systems understanding of humoral immunity. Understanding sequence-related properties of antibodies is thus valuable for the development of novel therapeutics and vaccines 5, 6. Thus, the similarity landscape of CDR3 amino acid (a.a.) sequences constitutes the clonal architecture of an antibody repertoire this architecture reflects the breadth of antigen-binding and therefore correlates with humoral immune protection and function. Antibody identity (clonality) and antigen specificity are primarily encoded in the highly diverse junctional site of recombination in the variable heavy chain, called the complementarity determining region 3 (CDR3) 4. Additions and deletions of nucleotides at the junctions of the gene segments further increase diversity 2, 3. The source of antibody diversity has long been identified to be the somatic recombination V−, (D− in the heavy chains) and J-genes 1. The high diversity of antibody repertoires, which is defined by the collection of an individual’s B-cell receptor (BCR) and antibody sequences, plays a major role in providing broad and protective humoral immunity. Our analysis provides guidelines for the large-scale network analysis of immune repertoires and may be used in the future to define disease-associated and synthetic repertoires. Finally, repertoire architecture is intrinsically redundant. The architecture of antibody repertoires is robust to the removal of up to 50–90% of randomly selected clones, but fragile to the removal of public clones shared among individuals. Antibody repertoire networks are highly reproducible across individuals despite high antibody sequence dissimilarity. Leveraging a network-based statistical framework, we identify three fundamental principles of antibody repertoire architecture: reproducibility, robustness and redundancy. Here, we establish a high-performance computing platform to construct large-scale networks from comprehensive human and murine antibody repertoire sequencing datasets (>100,000 unique sequences). The major principles that define the architecture of antibody repertoires have remained largely unknown.


The architecture of mouse and human antibody repertoires is defined by the sequence similarity networks of the clones that compose them.
