KEIO can be installed from GitHub:
# Create a conda environment and install and pybedtools
conda create -n keio python=3.7 pybedtools=0.8.2 vsearch=2.18.0
conda activate keio
git clone https://github.com/ravinpoudel/KEIO.git
cd KEIO
pip install .
# check if the installation works
keio -h
Following are the required softwares/programs.
VSEARCH
pybedtools
NMSLib
Biopython
Pandas
KEIO: A python software to process illumina reads for keio-collection type project.
optional arguments:
-h, --help show this help message and exit
--fastq FASTQ [FASTQ ...], -f FASTQ [FASTQ ...]
input fastq file
--upstreamFasta UPSTREAMFASTA, -uf UPSTREAMFASTA
A upstreamFasta file
--downstreamrcFasta DOWNSTREAMRCFASTA, -drcf DOWNSTREAMRCFASTA
A downstreamFasta file
--threads THREADS The number of cpu threads to use
--tempdir TEMPDIR The temp file directory
--keeptemp Should intermediate files be kept?
keio -f test/test_data/sample.fq.gz \
-uf test/test_data/upstream.fasta \
-drcf test/test_data/downstream_rc.fasta \
--threads 8
Once you have output from keio, we will be using R script to process the output.
Inputs needed for Rscript
Eg. DSS3BlPl.txt
Inplate_tag_mapping.html
mkdir results
Rscript script/keio.R sample.csv
Above script run one output at a time. If you have multiple fasta file to map, we can write a bash script/loop to run keio. Running keio can be done locally, as it should not take long time. Running Rscript might take longtime, depending on the totoal number of matches. Running as a SLURM array job will be helpful. Following is the slurm script.
script/run.sh
#!/bin/bash
#SBATCH --account="gbru_fy21_tomato_ralstonia"
#SBATCH --job-name=keio
#SBATCH --out=keio_%A_%a.log
#SBATCH --time=14-00:00:00
#SBATCH --array=1-58
#SBATCH -p atlas
#SBATCH -N 1
#SBATCH -n 48
#SBATCH --mem 250G
date;hostname;pwd
# load the required programs
# hpg1-compute
#module load conda
#source activate keio
RUN=${SLURM_ARRAY_TASK_ID}
echo "My Slurm RUN_ID: '${RUN}'"
echo "My TMPDIR IS: " $TMPDIR
# here a folder called keio_output contains all the csv's after running keio(python script)
infile=$(ls keio_output/*.csv | sed -n ${RUN}p)
echo "$infile"
time Rscript keio.R $infile
Above R script saves the result within a folder called "results". Now we have processed each fastq files and retrived the summarized information, we can copy the whole folder "results" to the local machince (laptop) and process in R using script/process.R
script. This script will read in all the output from "results" folder and additionally also need files:
Eg. DSS3BlPl.txt
NOTE: These files are very specific to DSS3 genome, and the script has to be optimized depending on the genomes being analyzed.
script/process.R