Spatial multiomics map of trophoblast development in early pregnancy

Human samples

Placental and decidual samples used for the in vivo and in vitro profiling were obtained from elective terminations from: The MRC and Wellcome-funded Human Developmental Biology Resource (HDBR, https://www.hdbr.org), with appropriate maternal written consent and approval from the Fulham Research Ethics Committee (REC reference 18/LO/0822) and Newcastle and North Tyneside 1 Research Ethics Committee (REC reference 18/NE/0290). The HDBR is regulated by the UK Human Tissue Authority (HTA; https://www.hta.gov.uk) and operates in accordance with the relevant HTA Codes of Practice.Addenbooke’s Hospital (Cambridge) under ethical approval from the Cambridge Local Research Ethics Committee (04/Q0108/23), which is incorporated into the overarching ethics permission given to the Centre for Trophoblast Research biobank for the Biology of the Human Uterus in Pregnancy and Disease Tissue Bank at the University of Cambridge under ethical approval from the East of England-Cambridge Central Research Ethics Committee (17/EE/0151) and from the London-Hampstead Research Ethics Committee (20/LO/0115).

Placental–decidual blocks (P13, P14 and P34) were collected prior to 1 September 2006 and consent for research use was not obtained. These samples are considered ‘Existing Holdings’ under the Human Tissue Act and as such were able to be used in this project. All the other tissue samples used for this study were obtained with written informed consent from all participants in accordance with the guidelines in The Declaration of Helsinki 2000.

All samples profiled were histologically normal.

TSC lines BTS5 and BTS11 derived from human blastocysts by H. Okae and colleagues⁵ were used in this study. Informed consent was obtained from all donors prior to the establishment of the cell line and the study was approved by the Ethics Committee of Tohoku University School of Medicine (Research license 2016-1-371), associated hospitals, the Japan Society of Obstetrics and Gynecology and the Ministry of Education, Culture, Sports, Science and Technology (Japan). This work was internally approved by HuMFre-20-0005 at the Wellcome Sanger Institute and the lines were covered by a Conditions of Use agreement with the Tohoku University School of Medicine (internal reference CG175).

Tissue cryopreservation

Fresh tissue samples of human implantation sites were embedded in cold OCT medium and flash-frozen using a dry ice-isopentane slurry as described⁴⁶.

Quality of archival frozen tissue samples was assessed by extraction of RNA from cryosections using the QIAGEN RNeasy Mini Kit, according to the manufacturer’s instructions including on-column DNase I digestion. RNA quality was assayed using the Agilent RNA 6000 Nano Kit. All samples processed for Visium and single-nuclei had RIN values greater than 8.7.

Single-nuclei extraction

Single-nuclei suspensions were isolated from frozen tissue sections when performing multiomic snRNA-seq, scATAC-seq and snRNA-seq, following the manufacturer’s instructions. For each OCT-embedded sample, 400 μm of tissue was prepared as 50 μm cryosections, which were paused in a tube on dry ice until subsequent processing. Nuclei were released via Dounce homogenization as described⁴⁷.

Single-cell isolation from tissue

We used the previous protocol optimized for the decidual–placental interface¹³. In short, decidual tissues were enzymatically digested in 15 ml 0.4 mg ml⁻¹ collagenase V (Sigma, C9263) solution in RPMI 1640 medium (Thermo Fisher Scientific, 21875-034)/10{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c} FCS (Biosfera, FB-1001) at 37 °C for 45 min. The supernatant was diluted with medium and passed through a 100-μm cell sieve (Corning, 431752) and then a 40-μm cell sieve (Corning, 431750). The flow-through was centrifuged and resuspended in 5 ml of red blood cell lysis buffer (Invitrogen, 00-4300) for 10 min. Placental villi were scraped from the chorionic membrane using a scalpel and the stripped membrane was discarded. The resultant villous tissue was enzymatically digested in 70 ml 0.2{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c} trypsin 250 (Pan Biotech P10-025100P)/0.02{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c} EDTA (Sigma E9884) in PBS with stirring at 37 °C for 9 min. The disaggregated cell suspension was diluted with medium and passed through a 100-μm cell sieve (Corning, 431752). The undigested gelatinous tissue remnant was retrieved from the gauze and further digested with 10–15 ml collagenase V at 1.0 mg ml⁻¹ (Sigma C9263) in Ham’s F12 medium/10{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c} FBS with gentle shaking at 37 °C for 10 min. The disaggregated cell suspension was diluted with medium and passed through a 100 μm cell sieve (Corning, 431752). Cells obtained from both enzyme digests were pooled together and passed through a 100 μm cell sieve (Corning, 431752) and washed in Ham’s F12. The flow-through was centrifuged and resuspended in 5 ml of red blood cell lysis buffer (Invitrogen, 00-4300) for 10 min.

Trophoblast in vitro cultures

Trophoblast stem cell (TSC) lines BTS5 and BTS11 derived by Okae and colleagues were grown as described previously⁵. In brief, TSC self-renewing medium (TSCM) components were substituted with local suppliers with the exception for 30{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c} w/v BSA from WAKO Japan and CHIR99021 concentration was increased to 6 µM which maintained the undifferentiated morphology as well as preserving its EVT invasive morphology. TSCs were grown on 5 µg ml⁻¹ Collagen IV (Corning) coated wells and early passaged cells between passages 24 and 26 were used for differentiation and analysis. For 2D differentiation into EVT identity, cells were seeded at a density of 1.3 × 10⁵ per cm² (corresponding to 125,000 cells plated on a well of a 6-well plate) in EVTM1 detailed below supplemented with ice-cold 2{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c} Matrigel GFR (Corning) before seeding on 1 µg ml⁻¹ Collagen IV (Corning) coated wells (D0). Three days later (D3), medium was changed to EVTM2 supplemented with ice-cold 0.5{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c} Matrigel GFR. Three days later (D6), the medium was changed to EVT medium 3 supplemented with ice-cold 0.5{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c} Matrigel GFR. Cells were treated with TrypLE for downstream analysis 48 h later (D8). For CXCL16 induction experiments, a final concentration of 100 ng ml⁻¹ CXCL16 (RnD 976-CX-025 with carrier, dissolved in 0.1{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c}BSA(WAKO)/PBS) were supplemented to EVTM2 or EVTM3 and analysed 48 h later. The induction was controlled by supplementing an equal volume of 0.1{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c} BSA/PBS.

In total, six trophoblast organoids were grown and differentiated into EVT as previously described^3,48. To differentiate trophoblast organoids into EVT, organoids were cultured with TOM for ~3–4 days and transferred into EVTM1 (+NRG1) for ~4–7 days. Once trophoblasts initiate their commitment into EVT (spike emergence), EVTM2 (−NRG1) is added for 4 days. Donors were differentiated and collected in batches of three that were multiplexed on the same 10x Genomics reaction. Samples for donors 1, 2 and 3 were collected at 3 h, 24 h and 48 h after the addition of EVTM2, while samples for donors 4, 5 and 6 were collected at 48 h before, and then 0 h, 48 h and 96 h after, addition of EVTM2. Organoids grown in TOM were also collected as a control at 96h.

Media compositions have been described previously^3,5,48 and are shown here. TSCM: DMEM/F12 with Glutamax (Gibco) supplemented with 0.2{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c} v/v FBS (Gibco), 0.3{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c} wt/vol BSA (WAKO), 1{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c} ITS-X (Gibco), 2.5 µg ml⁻¹ l-ascorbic acid-2-phosphate (Sigma), 50 ng ml⁻¹ EGF (Peprotech AF-100-15), 6 µM CHIR99021 (Tocris 4423), 0.5 µM A83-01 (Tocris 2939), 1 µM SB43154 (Tocris 1614), 0.8 mM VPA (Sigma, dissolved in H2O) and 5 µM Y-27632 (Millipore 688000). TOM: Advanced DMEM/F12, N2 supplement (at manufacturer’s recommended concentration), B27 supplement minus vitamin A (at manufacturer’s recommended concentration), Primocin 100 μg ml⁻¹, N-Acetyl-l-cysteine 1.25 mM, l-glutamine 2 mM, recombinant human EGF 50 ng ml⁻¹, CHIR99021 1.5 µM, recombinant human R-spondin-1 80 ng ml⁻¹, recombinant human FGF-2 100 ng ml⁻¹, recombinant human HGF 50 ng ml⁻¹, A83-01 500 nM, prostaglandin E2 2.5 µM, Y-27632 5 µM. EVTM1: Advanced DMEM/F12 (or DMEM/F12 for TSC-EVTM 2D), l-glutamine 2 mM, 2-mercaptoethanol 0.1 mM, penicillin/streptomycin solution 0.5{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c} (vol/vol), BSA 0.3{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c} (wt/vol, WAKO), ITS-X supplement 1{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c} (vol/vol), NRG1 (Cell Signaling 5218SC) 100 ng ml⁻¹, A83-01 7.5 µM, knockout serum replacement 4{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c} (vol/vol). EVTM2, Advanced DMEM/F12 (or DMEM/F12 for TSC-EVTM 2D), l-glutamine 2 mM, 2-mercaptoethanol 0.1 mM, penicillin/streptomycin solution 0.5{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c} (vol/vol), BSA 0.3{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c} (wt/vol, WAKO), ITS-X supplement 1{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c} (vol/vol), A83-01 7.5 µM, Knockout serum replacement 4{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c} (vol/vol) (this is the same as EVTM1 without NRG1). This medium can be stored at 4 °C for up to 1 week. EVTM3, DMEM/F12 (for TSC-EVTM 2D), l-glutamine 2 mM, 2-mercaptoethanol 0.1 mM, penicillin/streptomycin solution 0.5{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c} (vol/vol), BSA 0.3{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c} (wt/vol, WAKO), ITS-X supplement 1{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c} (vol/vol), A83-01 7.5 µM (this is the same as EVTM1 without NRG1 or knockout serum replacement). This can be stored at 4 °C for up to 1 week.

H&E staining and imaging

Fresh frozen sections were removed from −80 °C storage and air dried before being fixed in 10{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c} neutral buffered formalin for 5 min. After rinsing with deionised water, slides were stained in Mayer’s haematoxylin solution for 90 s. Slides were completely rinsed in 4–5 washes of deionised water, which also served to blue the haematoxylin. Aqueous eosin (1{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c}) was manually applied onto sections with a pipette and rinsed with deionised water after 1–3 s. Slides were dehydrated through an ethanol series (70{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c}, 70{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c}, 100{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c}, 100{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c}) and cleared twice in 100{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c} xylene. Slides were coverslipped and allowed to air dry before being imaged on a Hamamatsu Nanozoomer 2.0HT digital slide scanner.

Multiplexed smFISH and high-resolution imaging

Large tissue section staining and fluorescent imaging were conducted largely as described previously⁴⁹. Sections were cut from fresh frozen samples embedded in OCT at a thickness of 10–16 μm using a cryostat, placed onto SuperFrost Plus slides (VWR) and stored at −80 °C until stained. Tissue sections were processed using a Leica BOND RX to automate staining with the RNAscope Multiplex Fluorescent Reagent Kit v2 Assay (Advanced Cell Diagnostics, Bio-Techne), according to the manufacturers’ instructions. Probes are listed in Supplementary Table 8. Prior to staining, fresh frozen sections were post-fixed in 4{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c} paraformaldehyde in PBS for 6–8 h, then dehydrated through a series of 50{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c}, 70{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c}, 100{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c}, and 100{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c} ethanol, for 5 min each. Following manual pre-treatment, automated processing included heat-induced epitope retrieval at 95 °C for 15 min in buffer ER2 and digestion with Protease III for 15 min prior to probe hybridisation. Tyramide signal amplification with Opal 520, Opal 570, and Opal 650 (Akoya Biosciences) and TSA-biotin (TSA Plus Biotin Kit, Perkin Elmer) and streptavidin-conjugated Atto 425 (Sigma Aldrich) was used to develop RNAscope probe channels.

Stained sections were imaged with a Perkin Elmer Opera Phenix Plus High-Content Screening System, in confocal mode with 2 μm z-step size, using a 40× (NA 1.1, 0.149 μm/pixel) water-immersion objective. Channels: DAPI (excitation 375 nm, emission 435–480 nm), Atto 425 (excitation 425 nm, emission 463–501 nm), Opal 520 (excitation 488 nm, emission 500–550 nm), Opal 570 (excitation 561 nm, emission 570–630 nm), Opal 650 (excitation 640 nm, emission 650–760 nm).

Image stitching

Confocal image stacks were stitched as two-dimensional maximum intensity projections using proprietary Acapella scripts provided by Perkin Elmer.

10x Genomics Chromium GEX library preparation and sequencing

For the scRNA-seq experiments, cells were loaded according to the manufacturer’s protocol for the Chromium Single Cell 3′ Kit v3.0, v3.1 and 5’ v1.0 (10X Genomics). Library preparation was carried out according to the manufacturer’s protocol to attain between 2,000 and 10,000 cells per reaction. Libraries were sequenced, aiming at a minimum coverage of 20,000 raw reads per cell, on the Illumina HiSeq 4000 or Novaseq 6000 systems using the following sequencing format: (A) read 1: 26 cycles; i7 index: 8 cycles, i5 index: 0 cycles; read 2: 98 cycles; (B) read 1: 28 cycles; i7 index: 8 cycles, i5 index: 0 cycles; read 2: 91 cycles; (C) read 1: 28 cycles; i7 index: 10 cycles; i5 index: 10 cycles; read 2: 90 cycles (v3.1 dual).

For the multimodal snRNA-seq and scATAC-seq experiments, cells were loaded according to the manufacturer’s protocol for the Chromium Single Cell Multiome ATAC + Gene Expression v1.0 to attain between 2,000 and 10,000 cells per well. Library preparation was carried out according to the manufacturer’s protocol. Libraries for scATAC-seq were sequenced on Illumina NovaSeq 6000, aiming at a minimum coverage of 10,000 fragments per cell, with the following sequencing format; read 1: 50 cycles; i7 index: 8 cycles, i5 index: 16 cycles; read 2: 50 cycles.

10x Genomics Visium library preparation and sequencing

Ten-micrometre cryosections were cut and placed on Visium slides, then processed according to the manufacturer’s instructions. In brief, sections were fixed with cold methanol, H&E stained and imaged on a Hamamatsu NanoZoomer S60 before permeabilization, reverse transcription and cDNA synthesis using a template-switching protocol. Second-strand cDNA was liberated from the slide and single-indexed libraries were prepared using a 10x Genomics PCR-based protocol. Libraries were sequenced (1 per lane on a HiSeq 4000), aiming for 300M raw reads per sample, with the following sequencing format; read 1: 28 cycles, i7 index: 8 cycles, i5 index: 0 cycles and read 2: 91 cycles.

Alignment and quantification of scRNA-seq and snRNA-seq data

For each sequenced single-cell and single-nucleus RNA-seq library, we performed read alignment to the 10X Genomics’ GRCh38 3.0.0 human reference genome, mRNA version for scRNA-seq samples and pre-mRNA version for snRNA-seq samples, latter created following instructions from 10X Genomics: https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/references#premrna. Quantification and initial quality control were performed using the Cell Ranger Software (version 3.0.2; 10X Genomics) using default parameters. Cell Ranger filtered count matrices were used for downstream analysis.

Alignment and quantification of multiome data

For each sequenced snRNA-seq and ATAC–seq (multiome) library, we performed read alignment to custom made genome consisting of 10X Genomics’ GRCh38 3.0.0 pre-mRNA human reference genome and 10X Genomics Cell Ranger-Arc 1.0.1 ATAC genome, created following instructions from 10X Genomics: https://support.10xgenomics.com/single-cell-multiome-atac-gex/software/pipelines/latest/advanced/references. Quantification and initial quality control were performed using the Cell Ranger-Arc Software (version 1.0.1; 10X Genomics) using default parameters. Cell Ranger-Arc filtered count matrices were used for downstream analysis.

Downstream scRNA-seq and snRNA-seq analysis

Detection of doublets by gene expression

We used Scrublet for cell doublet calling on a per-library basis. We used a two-step diffusion doublet identification followed by Bonferroni FDR correction and a significance threshold of 0.01, as described in⁵⁰. Predicted doublets were not excluded from the initial analysis, but used afterwards to flag clusters with high doublet scores.

Detection of doublets by genotype

Souporcell⁵¹ was used to deconvolute (1) maternal and fetal origin of cells and nuclei in our scRNA-seq and snRNA-seq samples (including multiome snRNA-seq); (2) assignment of cells to individuals in pooled samples (namely, samples Pla_HDBR8768477, Pla_HDBR8715512 and Pla_HDBR8715514); and (3) organoids from multiple individuals. In some samples deconvolution into maternal or fetal origin by genotype was not possible which is probably owing to the highly skewed ratio of genotypes (either extremely high (>0.95) or extremely low (<0.05) ratio of maternal to fetal droplets). In those cases, maternal–fetal origin of the cells was identified using known markers from ref. ¹³.

Souporcell (version 2.4.0) was installed as per instructions in https://github.com/wheaton5/souporcell and used in the following way:

path_to/singularity exec ./souporcell.sif souporcell_pipeline.py -i ./cellranger_path/possorted_genome_bam.bam -b ./cellranger_path/filtered_feature_bc_matrix/barcodes.tsv -f ./genome_path/genome.fa -t 8 -o souporcell_result -k 2 –skip_remap True –common_variants ./filtered_2p_1kgenomes_GRCh38.vcf

Where k = 2 corresponds to the number of individuals to be deconvoluted (in our case either mother and fetus or pooled individuals H7 and H9 in samples Pla_HDBR8768477, Pla_HDBR8715512 and Pla_HDBR8715514. The accuracy of deconvolution was evaluated in downstream analysis once cluster identity was clear from either gene expression or predictions of logistic regression. In samples where deconvolution worked successfully, inter-individual doublets were further excluded from downstream analysis.

Filtering genes high in ambient RNA signal

To assess which genes in the scRNA-seq and snRNA-seq data were high in ambient RNA (soup) signal (further referred to as noisy genes), the following approach was undertaken separately for all the scRNA-seq and snRNA-seq samples: (1) Read in all the raw and filtered count matrices for each sample produced by Cell Ranger Software. (2) Discard droplets with < 5 unique moleular identifiers (UMIs) (likely to be fake droplets from sequencing errors). (3) Only keep data from samples which we further consider as noisy (where ‘Fraction reads in cells’ reported by Cell Ranger is less than 70{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c} (guided by 10X Genomics’ recommendations: https://assets.ctfassets.net/an68im79xiti/163qWiQBTVi2YLbskJphQX/e90bb82151b1cdab6d7e9b6c845e6130/CG000329_TechnicalNote_InterpretingCellRangerWebSummaryFiles_RevA.pdf). (4) Take the droplets that are in raw but are not in filtered matrices considering them as empty droplets. (5) Concatenate all raw objects with empty droplets into 1 joint raw object and do the same for filtered. (6) For all genes calculate soup probability as defined with the following equation: (P=E_g^rmempty,rmdroplets/(E_g^rmempty,rmdroplets+E_g^rmcells/rmnuclei)), where (E_g^rmempty;rmdroplets) is the total sum of expression (number of UMI counts) of gene g in empty droplets, and (E_g^rmcells/rmnuclei) is the total sum of expression counts of gene g in droplets that are considered as cells/nuclei by Cell Ranger. (7) For all genes calculate number of cells/nuclei where the gene is detected at >0 expression level (UMI counts). (8) Label genes as noisy if their soup probability exceeds 50{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c} quantile of soup probability distribution – done separately for cells and for nuclei.

This approach was used to estimate noisy genes in (1) donor P13 samples and (2) all donors’ samples. Donor P13 noisy genes were excluded during mapping onto space (Visium, see ‘Location of cell types in Visium data’), whereas all donors’ noisy genes (labelled using nuclei-only derived threshold in step 8 to not over-filter genes based on the higher quality portion of the data which in our case in scRNA-seq) were excluded during all donors analysis of the whole atlas of all the cell states at the maternal–fetal interface.

Quality filters, alignment of data across different batches and clustering

We integrated the filtered count matrices from Cell Ranger and analysed them with scanpy (version 1.7.1), with the pipeline following their recommended standard practises. In brief, we excluded genes expressed by less than three cells, excluded cells expressing fewer than 200 genes, and cells with more than 20{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c} mitochondrial content. After converting the expression space to log(CPM/100 + 1), the object was transposed to gene space to identify cell cycling genes in a data-driven manner, as described in^50,52. After performing principal component analysis (PCA), neighbour identification and Louvain clustering, the members of the gene cluster including known cycling genes (CDK1, MKI67, CCNB2 and PCNA) were flagged as the data-derived cell cycling genes, and discarded in each downstream analysis where applicable.

Next, to have an estimate of the optimal number of latent variables to be used later in the single-cell variational inference (scVI) workflow for dimensionality reduction and batch correction, we identified highly variable genes, scaled the data and calculated PCA to observe the variance ratio plot and decide on an elbow point which defined values of n_latent parameter which were then used to correct for batch effect by 10X library batch (‘sample’) with scVI. Number of layers in scVI models was tuned manually to allow for better integration. The resulting latent representation of the data was used for calculating neighbourhood graph, UMAP and further Louvain clustering. For trophoblast organoid scRNA-seq and snRNA-seq, data were integrated with Harmony by donor using theta = 0 parameter.

Analysis was done separately for (a) donor P13 trophoblast compartment and (b) all donors’ data (all cell states). In both analyses (a) and (b) trophoblast data was analysed separately with consecutive rounds of re-analysis upon exclusion of clusters of noisy nature (exhibiting gene expression characteristic of more than 1 distinct population). In addition, in all donors’ analysis fibroblast (maternal and fetal separately) and maternal NK, T, myeloid, epithelial, endothelial and perivascular compartments were reanalysed separately using the approach described in the previous paragraph to achieve fine grain annotation.

Differential gene expression analysis

Differential gene expression analysis was performed with limma (limma version 3.46.0, edgeR version 3.32.1) with “cell_or_nucleus” covariate (scRNA-seq or snRNA-seq (including multiome snRNA-seq) origin of each droplet) backwards along the trajectory that was derived using stOrder approach, namely for the following 6 comparisons: VCT-CCC vs VCT (VCT and VCT-p cell states together); EVT-1 vs VCT-CCC; EVT-2 vs EVT-1; iEVT vs EVT-2; GC vs iEVT; eEVT vs EVT-2. Only significant DEGs were considered for downstream analysis, namely those with FDR (bonferroni) < 0.05).

Alignment, quantification and quality control of multiome ATAC data

We processed scATAC-seq libraries coming from multiome samples (read filtering, alignment, barcode counting, and cell calling) with 10X Genomics Cell Ranger-Arc (version 1.0.1) using the pre-built 10X GRCh38 genome (version corresponding to Cellranger-arc 1.0.1) as reference. We called the peaks using an in-house implementation of the approach described in Cusanovich et al. ⁵³ (available at https://github.com/cellgeni/cellatac, revision 21-099). In short, the genome was broken into 5-kb windows and then each cell barcode was scored for insertions in each window, generating a binary matrix of windows by cells. Matrices from all samples were concatenated into a unified matrix, which was filtered to retain only the top 200,000 most commonly used windows per sample. Using Signac (https://satijalab.org/signac/ version 0.2.5), the binary matrix was normalized with term frequency-inverse document frequency (TF-IDF) followed by a dimensionality reduction step using Singular Value Decomposition (SVD). The first latent semantic indexing (LSI) component was ignored as it usually correlates with sequencing depth (technical variation) rather than a biological variation⁵³. The 2–30 top remaining components were used to perform graph-based Louvain clustering. Next, peaks were called separately on each cluster using macs2⁵⁴. Finally, peaks from all clusters were merged into a master peak set (that is, peaks overlapping in at least one base pair were aggregated) and used to generate a binary peak-by-cell matrix, indicating any reads occurring in each peak for each cell.

This analysis was done separately for (1) all multiome data at first and (2) trophoblast-only subset of the multiome data. In the latter analysis we used annotation labels from the RNA counterpart of the multiome samples to perform peak calling.

Alignment, quantification and quality control of Visium data

For each 10X Genomics Visium sample, we used Space Ranger Software Suite (version 1.1.0) to align to the GRCh38 human reference pre-mRNA genome (official Cell Ranger reference, version 3.0.0) and quantify gene counts. Spots were automatically aligned to the paired H&E images by Space Ranger software. All spots under tissue detected by Space Ranger were included in downstream analysis.

Downstream analysis of 10X Genomics Visium data

Location of cell types in Visium data

To locate the cell states in the Visium transcriptomics slides, we used the cell2location tool v0.06-alpha⁵⁵. As reference, we used snRNA-seq data of donor P13. We used general cell state annotations from the joint all donors’ analysis (corresponding to donor P13 data), with the exception of the trophoblast lineage. Trophoblast annotations were taken from donor P13-only analysis of the trophoblast compartment. Using information about which genes are noisy (high in ambient RNA signal) in donor P13 snRNA-seq data (details in ‘Filtering genes high in ambient RNA signal’), we excluded those from the reference and Visium objects prior to cell2location model training which significantly improved the results of mapping (namely, eliminated off-target mapping of cell states—that is, made results of mapping more specific to the correct anatomical regions). Following the tutorial at https://cell2location.readthedocs.io/en/latest/notebooks/cell2location_tutorial.html#Cell2location:-spatial-mapping, we trained cell2location model with default parameters using 10X library as a batch covariate in the step of estimation of reference cell-type signatures. Results were visualized with scanpy (version 1.7.1). Plots represent estimated abundance of cell types (cell densities) in Visium spots.

Subsetting Visium data into anatomical regions with SpatialDE2

We used SpatialDE2⁵⁶ tissue segmentation algorithm to assign Visium spots to three anatomical regions: (1) placenta; (2) decidua and villi tips; and (3) myometrium. We used mRNA abundances from the deconvolution results obtained with cell2location¹⁷ in SpatialDE2 tissue segmentation. Assignment of obtained Visium spot clusters to regions was done upon visual inspection. Locations of certain fibroblast cell states indicative of the specific anatomical region (uterine smooth muscle cells, uSMC and dS cell states) were also used to guide this assignment. In addition, low-quality spots were discarded on the basis of not being under tissue and having low count and gene coverage (visual inspection).

For more details, please refer to the following notebook: https://github.com/ventolab/MFI/blob/main/2_inv_troph_trajectory_and_TFs/2-1_stOrder_inv_troph/S1_regions_analysis_for_SpCov_model_and_later_for_CellPhone.ipynb

Downstream snATAC-seq analysis

Quality filters

To obtain a set of high-quality peaks for downstream analysis, we filtered out peaks that (1) were included in the ENCODE blacklist, (2) have a width outside the 210–1,500 bp range, and (3) were accessible in less than 5{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c} of cells from a cellatac cluster. Low-quality cells were also removed by setting to 4 the minimum threshold for log1p-transformed total counts per cell.

Alignment of data across different batches and clustering

We adopted the cisTopic approach^57,58 for the core of our downstream analysis. cisTopic employs latent Dirichlet allocation (LDA) to estimate the probability of a region belonging to a regulatory topic (region–topic distribution) and the contribution of a topic within each cell (topic–cell distribution). The topic–cell matrix was used for constructing the neighbourhood graph, computing UMAP projections and clustering with the Louvain algorithm. After this was done for all cell states, clusters corresponding to trophoblast cell states (based on the unbiased clustering done here and annotation labels coming from the RNA counterpart of this multiome data) were further subsetted and reanalysed following the same pipeline.

Gene activity scores

Next, we generated a denoised accessibility matrix (predictive distribution) by multiplying the topic–cell and region–topic distribution and used it to calculate gene activity scores. To be able to integrate them with scRNA-seq and snRNA-seq data, gene activity scores were rounded and multiplied by a factor of 10⁷, as described⁵⁸.

Cell-type annotation of invading trophoblast

Final labels of invading trophoblast in snATAC-seq data were directly transferred from RNA counterpart of the multiome data.

Join inference of trophoblast invasion from gene expression and spatial data

StOrder is a computational framework for joint inference of cellular differentiation trajectories from gene expression data and information about location of cell states in physical space (further referred to as spatial data).

It consists of three principal steps:

1.

Calculate pairwise cell state connectivity from gene expression data (here we use snRNA-seq data).
2.

Calculate pairwise cell state proximity in physical space from spatial data (here we use Visium spatial transcriptomics data) using a new spatial covariance model.
3.

Combine connectivity matrices from steps 1 and 2 in a weighted expression to reconstruct the putative tree structure of the differentiation trajectory.

First, StOrder relies on a gene expression-based connectivity matrix (generated in our case by PAGA⁵⁹) that establishes potential connections between cell state clusters defined by single-cell or single-nucleus transcriptomics datasets. The values in this matrix can be interpreted as pairwise similarity scores for cell states in gene expression space. In our case we used snRNA-seq data from P13 as it contains all trophoblast subsets.

Second, StOrder generates a spatial covariance matrix that reflects pairwise proximity of cell states that co-exist in space and smoothly transition from one state to another while physically migrating in space. To do so, StOrder takes as an input the deconvolution results (derived in our case with cell2location¹⁷) of Visium spatial transcriptomics data. Here, we used all spatial transcriptomics data profiled (donors P13, P14 and Hrv43). Then, it fits a Gaussian process model that derives pairwise spatial covariance scores for all the cell state pairs with the following model:

$$rmvec(bfY_i,bfY_j) sim mathscrN,left(0,,left(beginarrayccsigma _1^(1) & sigma _2^(1)\ sigma _2^(1) & sigma _3^(1)endarrayright)otimes K(bfX,l)+left(beginarrayccsigma _1^(2) & 0\ 0 & sigma _2^(2)endarrayright)otimes bfIright)$$

where ⊗ is the Kronecker product and the combined vector of cell densities (Y_i,k, Y_j,k) of cell states i and j is modelled by a multivariate Gaussian distribution whose covariance decomposes into a spatial and a noise term. The spatial term

$$left(beginarrayccsigma _1^(1) & sigma _2^(1)\ sigma _2^(1) & sigma _3^(1)endarrayright)otimes Kleft(bfX,lright)$$

is defined by a between-cell-state covariance matrix

$$left(beginarrayccsigma _1^(1) & sigma _2^(1)\ sigma _2^(1) & sigma _3^(1)endarrayright)$$

and a spatial covariance matrix defined using the squared exponential kernel:

$$K(bfX,l)_mn=exp left(-fracparallel x_m-x_nparallel ^22l^2right)$$

x_m and x_n are spatial coordinates of spots m and n and l is the length scale of the smooth Gaussian process function in space that is being fit to cell densities.

The noise term

$$left(beginarrayccsigma _1^(2) & 0\ 0 & sigma _2^(2)endarrayright)otimes bfI$$

represents sources of variation other than spatial covariance of cell state densities.

The between-cell-state covariance matrix is constrained to be symmetric positive definite by defining

$$left(beginarrayccsigma _1^(1) & sigma _2^(1)\ sigma _2^(1) & sigma _3^(1)endarrayright)=,left(beginarraycca_1 & 0\ a_2 & a_3endarrayright)left(beginarraycca_1 & 0\ a_2 & a_3endarrayright)^rmT$$

The free parameters a₁, a₂, a₃, σ₁⁽²⁾, σ₂⁽²⁾ and l are estimated using maximum likelihood and automatic differentiation in Tensorflow^60,61 using the BFGS algorithm. To improve convergence, we initialize l to the distance between centres of neighboring Visium spots.

This model allows us to infer which cell states are proximal in physical space and are likely to be migrating in the process of gradual differentiation in space.

For the spatial covariance model within StOrder workflow we only used a subset of our Visium data that corresponded to (1) decidua_and_villi_tips and (2) myometrium—because only these regions contained invading trophoblast cell states. For more details please see ‘Subsetting Visium data into anatomical regions with SpatialDE2’ in ‘Downstream analysis of 10x Genomics Visium data’ above. This helps to focus on the regions of the tissue that are relevant for the process of interest and is recommended to do in general if there are parts of the Visium data that do not contain cell states relevant to the process of interest.

Third, StOrder reconstructs connections between cell states by taking into account both the connectivity matrix (step 1) from single-cell transcriptomics data and the spatial covariance matrix (step 2) from the spatial data in the following way:

$$beta (alpha P+(1-alpha )S)+(1-beta )Podot S$$

where P is the PAGA connectivity matrix, S is the spatial correlation matrix, α weights the contributions of P and S in the additive term, β weights the contributions of the additive and multiplicative terms, and ⊙ is the element-wise product. It then reconstructs the putative trajectory tree using the built-in PAGA functions.

The combined connectivity matrix based on both gene expression and spatial data with a range of weight parameters revealed the fully resolved invasion trajectory tree of the EVT with the correct topology (all connected cell state components, one branching point, no cycles, start at VCT-CCC population and two end points: eEVT and GC populations). The choice of ω parameter (contribution/weight of gene expression vs spatial part in the final matrix) in this last step depends on the goal of using this approach. In our case, we assumed: (1) the origin of EVT (VCT-CCC); (2) the end points of EVT (eEVT and GC); (3) the determination of a single branching point; and (4) the absence of cyclic trajectory. We therefore produced trajectory trees for 10,201 of (α,β) value pairs (from 0 to 1 with 0.01 increment step each) representative of different tree topologies corresponding to different ratios of gene expression vs spatial contribution. Out of the 10,201 tree structures we inspected, 3,574 trees represented the topology with the assumptions described above. These trajectories consistently assigned EVT-2 as the putative branching point. Tree structures with mainly gene expression-based connectivity values did not yield a branching point population we were looking for. Tree structures with mainly spatial based connectivities hindered the link between iEVT and GC populations, likely due to the large length scale of this invasion in space.

Limitations

Our approach assumes the gradual nature of gene expression changes accompanied by gradual migration of cells in space while they differentiate. Thus, it may not yield meaningful results in scenarios where this underlying assumption is violated. In addition, it is recommended that the user estimates the spatial scale at which the process of interest is taking place—whether in current Visium resolution the differentiation and migration is happening over the course of only a few spots or many more—this will change the initial values of l parameter and help the model fit the data better.

Combined RNA and ATAC analysis using MEFISTO

Preprocessing of multiome data and training of the MEFISTO model

Gene expression (snRNA-seq) counts of the multiome data for donor P13 were normalized by total counts (scanpy.pp.normalize_per_cell(rna, counts_per_cell_after=1e4)) and log-transformed (pp.log1p(rna)). Highly variable gene features were then calculated (sc.pp.highly_variable_genes(rna, min_mean=0.0125, max_mean=3, min_disp=0.5)) and the subsetted object’s expression was scaled (sc.pp.scale(rna, max_value=10)).

Chromatin accessibility (scATAC-seq) counts of the multiome data for donor P13 were preprocessed using TF-IDF normalization (muon.atac.pp.tfidf(atac[key], scale_factor=1e4)). To select biologically meaningful highly variable peak features, ATAC counts were aggregated into pseodubulks by cell states and averaged, then variance of accessibility was calculated across these pseudobulks, and informative peak features were selected based on this measure (top 75th percentile (10,640) of peaks selected in total) as the peaks with highest variance. Finally, these data were scaled (sc.pp.scale(atac, max_value=10)).

Using the preprocessed RNA and ATAC data we used a pseudotime-aware dimensionality reduction method MEFISTO³⁰ to extract major sources of variation from the RNA and ATAC data jointly and identify coordinated patterns along the invasion trajectory. As a proxy for the trophoblast invasion trajectory in the MEFISTO model we used 2-dimensional pseudotime coordinates based on a UMAP of the RNA data by calculating PCA (sc.tl.pca(rna, n_comps=8)), neighborhood graph (sc.pp.neighbors(rna)) and UMAP embedding (sc.tl.umap(rna)).

The MEFISTO model was trained using the following command within MUON (version 0.1.2) package interface:

muon.tl.mofa(mdata, outfile=’’,

use_obs = “union”,

smooth_covariate=[“UMAP1”, “UMAP2”],

use_float32=True)

We further excluded factor 5 from downstream analysis as a technical artefact due to its significant and high correlation (Spearman rank-order correlation coefficient 0.94 (over all cell states), P < 10⁻³⁰⁸, two-sided test) with the n_peaks_by_counts (number of ATAC peaks with at least 1 count in a nucleus) in ATAC view in all cell states (Supplementary Fig. 4k) and lack of smoothness along pseudotime (Supplementary Fig. 4j).

Defining groups of ATAC peak features

To further interpret ATAC features, we annotated them based on their genomic location using GenomicRanges package (version 1.42.0). In parallel, we used epigenetic data from⁶² to mark peak features in close proximity to trophoblast-specific enhancer features. To do so, we used peak files corresponding to H3K4me1, H3K27ac and H3K27me3 histone modifications marks for second trimester trophoblast samples (obtained from authors of aforementioned study upon request) to infer regions of the genome corresponding to active (H3K27ac + H3K27me3), primed (only H3K4me1) or repressed (H3K4me1 + H3K27me3) enhancers. This was done using bedtools (version 2.30.0) in the following way:

(1)

bedtools subtract -a H3K4me1_file.bed -b H3K27ac_file.bed > interm_file.bed bedtools subtract -a interm_file.bed -b H3K27me3_file.bed > primed_enhancers.bed To produce primed enhancers file
(2)

bedtools intersect -a H3K4me1_file.bed -b H3K27ac_file.bed > active_enhancers.bed To produce active enhancers file
(3)

bedtools intersect -a H3K4me1_file.bed -b H3K27me3_file.bed > repressed_enhancers.bed To produce repressed enhancers file

The enhancer files produced were then overlapped with peaks in ATAC analysis (bedtools intersect -a atac_peaks_file.bed -b enhancer_file.bed -wa) and any peaks having a >1-bp overlap with an enhancer feature were considered to be proximal to those features (done separately for active, primed and repressed enhancers).

Enrichment analysis of features in the MEFISTO model

Gene set enrichment analysis for gene features was performed based on the C5 category and the Biological Process subcategory from the MSigDB database (https://www.gsea-msigdb.org/gsea/msigdb) using GSEA functionality implemented in MOFA2 (run_enrichment command, MOFA2 version 1.3.5). This was done separately for negative and positive weights of each factor.

Peak group enrichment for peak features was performed using the same run_enrichment command in MOFA2 on peak groups defined as described above (Defining groups of ATAC peak features).

Transcription factor analysis using the MEFISTO model

To extract information about transcription factor binding motif enrichment in ATAC features of MEFISTO factors, we first performed enrichment analysis of peaks using GSEA functionality implemented in MOFA2 (run_enrichment command, MOFA2 version 1.3.5) on the peak-motif matrix produced by Signac package (version 1.5.0). Then, to identify which MEFISTO factors contribute the most to each transition of cell states along the invading trophoblast trajectory (inferred with StOrder), we trained logistic regression classifiers for each transition along the trajectory (overall for 6 transitions: VCT→VCT-CCC, VCT-CCC→EVT-1, EVT-1→EVT-2, EVT-2→iEVT, iEVT→GC, EVT-2→eEVT) on the matrix of factor values. For each transition the factor with the highest absolute coefficient separating the two cell states was selected, accounting for the sign of contribution in the logistic regression (positive or negative). If the top factor is contributing to a transition with a positive coefficient, transcription factor binding motifs coming from MEFISTO enrichment analysis of this factor’s top positive values are further considered in general transcription factor analysis as transcription factors upregulated upon this transition, whereas transcription factor binding motifs coming from MEFISTO enrichment analysis of this factor’s top negative values are further considered in general transcription factor analysis as transcription factors downregulated upon this transition. All of these transcription factor motifs are marked as having evidence from the MEFISTO factor relevant for this transition. Reverse procedure is applied in case if the top factor is contributing to a transition with a negative coefficient in the corresponding logistic regression model.

For more details please see the following notebook: https://github.com/ventolab/MFI/blob/main/2_inv_troph_trajectory_and_TFs/2-5_MEFISTO_analysis_inv_troph/S3_DEG_comparison_to_MEFISTO_factor_translation.ipynb

Trophoblast trajectory inference analysis

To derive trophoblast pseudotime based on transcriptomic similarity, we used Slingshot v1.8.0. With Slingshot we fitted a cluster-based minimum spanning tree (MST) over the two-dimensional UMAP of P13 trophoblasts, and inferred the global lineage topology to assign cell states to lineages. Only donor P13 cells in the G1 phase of the cell cycle were included. To balance trophoblast state contributions, we downsampled each trophoblast state to account for up to 100 cells per state. VCT was assigned as the initial cell state (start.clus), while eEVT, SCT and GC were assigned as terminal states (end.clus). Slingshot fits simultaneous principle curves to smooth the MST and assigns a weight for each trophoblast cell in each lineage. Slingshot outputs lineage-specific pseudotimes and weights of assignment for each cell.

We next fitted a tradeSeq (v1.4.0) gene expression model (negative binomial generalized additive model) using the trajectory pseudotime and the weights computed with Slingshot (with nknots = 6). Next, we tested whether the gene expression is significantly changing along trophoblast pseudotime. For such a purpose, we used the statistical test implemented in the associationTest function, which tests the null hypothesis that all smoother coefficients are equal to each other. Genes with a P < 10⁻⁶ and mean logFC > 0.5 were selected as the main drivers of the trophoblast trajectory.

Cell label transferring on trophoblast organoids

To transfer cell labels from donor P13 snRNA-seq in vivo trophoblast to the scRNA-seq TSC and PTO we trained two independent logistic regression models. The P13 dataset was downsampled to 500 cells per trophoblast state, except for GC and eEVT, which were discarded from the training due to their scarcely abundance. The common highly variable genes (1,695 genes for PTO and 1,565 for TSC), of the 4,000 selected per dataset, between the in vivo and each individual organoid dataset were selected as features for model training. The in vivo dataset was split into 80/20 training and test set, hyperparameters were explored employing a threefold cross-validation and scored employing the mean Matthews correlation coefficient of each fold. Top-ranked models were selected and assessed on the test set, with no significant differences found between them. Finally, the best model for each organoid dataset was employed to transfer cell labels from donor P13.

Cell–cell communication analysis with CellPhoneDB

To retrieve interactions between invading trophoblast and other cell populations identified in our samples, we used the CellPhoneDB degs_analysis method^13,63 (https://github.com/ventolab/CellphoneDB) described in ref. ³³. In short, we retrieved the interacting pairs of ligands and receptors meeting the following requirements: (1) all the protein members were expressed in at least 10{33c86113bcc32821f63c6372852a0f501e07fff55ce3ce61b15b246c5f8c531c} of the cell type under consideration; and (2) at least one of the protein members in the ligand or the receptor was a DEG in an invading trophoblast subset (according to our analysis of differential expression, for details please see ‘Differential gene expression analysis’), with an adjusted P-value below 0.05 and logFC > 0.1. We further selected which cell states are spatially co-located in each microenvironment via visual inspection of cell2location deconvolution results for our Visium data. The analysis was done on an updated version of CellPhoneDB-database (v4.1) which includes novel intercellular interactions from refs. ⁶⁴^,⁶⁵. Only bona fide manually curated interactions were considered in the analysis.

Transcription factor analysis

To prioritize the transcription factors relevant for each invading trophoblast cell state or microenvironment, we integrate four types of measurements: (1) expression levels of the transcription factor and (2) the activity status of the transcription factor measured from (2a) the expression levels of their targets (described in ‘Transcription factor activities derived from scRNA-seq and snRNA-seq’) and/or (2b) the chromatin accessibility of their binding motifs (described in ‘Transcription factor motif activity analysis from scATAC–seq’) and/or (2c) evidence of the chromatin accessibility of their binding motifs in relevant factors from multimodal RNA-ATAC analysis (with MEFISTO). Plots in main figures include transcription factor meeting the following criteria: (1) transcription factor was differentially expressed, with adjusted P-value < 0.05) and/or (2) transcription factor was differentially active, with log₂ fold change greater than 0.25 and adjusted P-value < 0.05 in at least one of the transcription factor activity measurements (2a or 2b).

Transcription factor differential expression from scRNA-seq and snRNA-seq

We compute differential expression using the procedure described in ‘Differential gene expression analysis’ and further subset resulting gene targets to transcription factors only based on the list of transcription factors provided by DoRothEA.

Transcription factor activities derived from scRNA-seq and snRNA-seq

We estimated protein-level activity for human transcription factor as a proxy of the combined expression levels of their targets. Target genes were retrieved from Dorothea⁶⁶, an orthogonal collection of transcription factor targets compiled from a range of different sources. Next, we estimated transcription factor activities for each cell using Viper⁶⁷, a GSEA-like approach, as implemented in the Dorothea R package and tutorial⁶⁸ for the genes differentially expressed along the invading trophoblast trajectory (see ‘Differential gene expression analysis’).

Transcription factor motif activity analysis from scATAC–seq

Transcription factor motif activities were computed using chromVar⁶⁹ v. 1.12.2 with positional weight matrices from JASPAR2018⁷⁰, HOCOMOCOv10⁷¹, SwissRegulon⁷², HOMER⁷³. chromVar returns a matrix with binding activity estimates of each transcription factor in each cell, which we used to test for differential transcription factor binding activity between trophoblast cell states with FindMarkers function in Seurat (default parameters) in the same way as described in ‘Differential gene expression analysis’ (backwards along invading trophoblast trajectory).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.