greendb_query tool usage
greendb_query assists in quering the GREEN-DB database. Given a list of region IDs, variant IDs or a table or variant and relevant regions, the tool generates a set of tables containing detailed information on the regions of interest, overlap with additional supporting regions (TFBS, DNase HS peaks, UCNE, dbSuper), gene-region connections, tissue of activity and associated phenotypes.
greendb_query [-h] (-v VARIDS | -r REGIDS | -t TABLE) -o OUTPREFIX -g
{GRCh37,GRCh38} --db GREENDB [--logfile LOGFILE]
Possible inputs
The tools allows to query GREEN-DB using 3 different type of inputs. Only one type of input can be specified.
1. List of regions (-r)
If you are simply interested in detailed information on a list of regions, you can use the -r input.
This argument accepts a comma-separated list of regions (like ID1,ID2) or a text file with one region ID per line.
2. VCF file (-v)
If you have a small list of variants for which you want to extract overalpping regulatory regions, you can
input a them as a comma-separated list of variant IDs (like var1,var2) or a text file with one variant ID per line
A variant ID has the format chrom_pos_ref_alt
3. Variant-regions table (-t)
If you have a list of variants of interest for which you know the relevant GREEN-DB region IDs you can query the DB directly providing a tab separated text file with no header and 2 columns:
column 1: variant ID in the format chrom_pos_ref_alt
column 2: comma-separated list of region IDs overlapping the variant
This table can be generated automatically from a VCF annotated with greenvaran by using greenvaran querytab
Output tables
The tool will generate 6 tables with the provided prefix. Some table may be empty if the corresponding information is missing. Output tables structure is described below
regions
Details on the regions of interest
gene_details
Details on the controlled genes, reporting the tissue where the gene-region interaction is detected
pheno_details
Details on the phenotypes potentially associated with the regions of interest
DNase, dbSuper, TFBS, UCNE
For each of the 4 functional elements a table is generated with details on each element overlapping the region(s) / variant(s) of interest.
Variant(s) of interest
When the input contains variants of interest (-t, -v), an additional column is added to all tables. A region or element is reported in the output only if it overlaps with one of the variants.
Arguments list
- -v VARID, --vcf VARID
- Comma separated list of variant IDs or file with a list of variant IDs
- -r REGIDS, --regIDs REGIDS
- Comma separated list of region IDs or file with a list of region IDs
- -t TABLE, --table TABLE
- Tab-separated file withcol1 (chr_pos_ref_alt)col2 comma-separated list of region IDs
- -o OUTPREFIX, --outprefix OUTPREFIX
Prefix for output files
- -g BUILD, --genome BUILD
- Possible values:
{GRCh37,GRCh38}Genome build for the query - --db GREENDB
- Location of the GREEN-DB SQLite database file (.db)
- --logfile LOGFILE
- Custom location for the log file