X-factor-ART

SCRIPTS

Lots of scripts that can be used to reproduce the analyses and figures.

XchromeAnalysis.R

Script to run the main analyses in MoBa cohort.

Bootstrap_Consistency_findings.R

Run bootstrapping to check consistency in the significance of the findings.

check_dmr.Rmd

Checking the DMRff results, creating Figures 7 & 8 in the paper and Figures S4 & S6 in the Supplementary, as well as the table with all the significant DMRs.

check_results.Rmd

Checking results produced in XchromeAnalysis.R, creating Figures 2 & 3 as well as Supplementary Figure S3 and the table with all the significant results.

create_coMET-like_figure.R

Creating two figures:

Such a pair of figures was constructed for each significant CpG (Figures 4–6).

gg_qqplot.R

Script for calculating data and plotting QQplots. Adapted from: https://slowkow.com/notes/ggplot2-qqplot/

Creating Figure S2 (supplementary).

grabGenes.R

Script for extracting human genes and transcripts within a certain region. Uses the following packages:

NOTE: all these packages are part of BioConductor

grabRegulRegions.R

Script for extracting regulatory region annotation within a certain region. Uses the following packages:

NOTE: all these packages are part of BioConductor

extract_data_from_yin.sh

Extracting data from PDF file in Supplementary Materials of the publication:

Yin, Y. et al. Impact of cytosine methylation on DNA binding specificities of human transcription factors. Science, Vol 356 (6337), 2017

This script extracts the information about TFs (transcription factors) classification based on their binding to methylated and unmethylated DNA sequences. The data is saved as tab-delimited text file DATA/extracted_lines_clean.txt, which is checked and cleaned in create_TF_methyl_binding_dataset.R and analyzed in check_TFs_binding_signif_CpGs.R.

The output of running the extract_data_from_yin.sh script is in extract_data_from_yin.out.

create_TF_methyl_binding_dataset.R

Reading in the extracted lines DATA/extracted_lines_clean.txt and cleaning the dataset. The final dataset is in DATA/extracted_lines_clean_all.dat.

NOTE: The script includes detailed information about the data!

Output of running the script is in create_TF_methyl_binding_dataset.html.

check_TFs_binding_signif_CpGs.R

Checking the classification of the TFs (transcription factors) that were found to possibly bind to the significant CpGs, based on:

  1. data from JASPAR 2022 db visualized in ensembl browser, GRCh37,
  2. manual search in MeDReader db, and
  3. data extracted from Yin, Y, et al., Science, 2017, as described above.

The information collected in points 1. and 2. was manually entered into text file DATA/TFs_binding_to_CpGs_JASPAR_ensembl.dat.

The output of running the .R script is in check_TFs_binding_signif_CpGs.html.