Metadata-Version: 2.1
Name: pywgsim
Version: 0.0.6
Summary: pywgsim
Home-page: https://github.com/ialbert/pywgsim
Author: Istvan Albert
Author-email: istvan.albert@gmail.com
License: UNKNOWN
Description: ## pywsgim
        
        pywgsim is a python wrapper around the wgsim short read simulator. 
        
        * https://github.com/lh3/wgsim
        
        ## Usage
        
            pywgsim -h
        
        ## Installation
        
            pip install pywgsim
                         
        ## Changes
        
        The original code for wgsim has been expanded a little bit. The main changes are:
        
        1. The information on the mutations introduced by `wgsim` are now generated in GFF format.
        1. There is a new flag called `--fixed` that generates the same `N` number of reads for each chromosome.
        1. The separator character in the read name has been changed from `_` to `|`. This follows a more widely accepted standard (i.e. NCBI) and allows identifying the contig name from the read name. 
        
        In the default operation of wgsim the `N` reads are distribute such to create a uniform coverage across all chromosomes (longer chromosomes get a larger fraction of N)
         
        ## Mutation output
        
        The output generated by `pywgsim` looks like this:
        
            ##gff-version 3
            #
            # N=1000 err_rate=0.02 mut_rate=0.001 indel_frac=0.15000001 indel_ext=0.25 size=500 std=50 len1=100 len2=100 seed=1606965870
            #
            NC_001416.1     wgsim   snp     1047    1047    .       +       .       Name=A/C;Ref=A;Alt=C;Type=hom
            NC_001416.1     wgsim   snp     1308    1308    .       +       .       Name=C/Y;Ref=C;Alt=Y;Type=het
            NC_001416.1     wgsim   snp     1533    1533    .       +       .       Name=G/T;Ref=G;Alt=T;Type=hom
            NC_001416.1     wgsim   snp     2472    2472    .       +       .       Name=C/M;Ref=C;Alt=M;Type=het
            NC_001416.1     wgsim   snp     2964    2964    .       +       .       Name=A/M;Ref=A;Alt=M;Type=het
            NC_001416.1     wgsim   snp     5375    5375    .       +       .       Name=G/R;Ref=G;Alt=R;Type=het
            
            
        ## New read names
            
        The read names are now of the form:
        
               @NC_002945.4|1768156|1768694|0:0:0|4:0:0|4
        
        Where:
        
           * `NC_002945.4` is the contig name that the fragment was generated from.
           * `1768156` is the left-most position of the fragment.
           * `1768694` is the right-most position of the fragment.
           * `0:0:0` are the number of errors, substitutions and indels in the left-most read of the pair.
           * `4:0:0` are the number of errors, substitutions and indels in the right-most read of the pair.
           * `4` is the read pair number, unique, per contig.
        
        ## Help
        
            $ pywgsim -h
            
        prints:
        
            usage: pywgsim [-h] [-a 1.fq] [-b 2.fq] [-N 1000] [-f] [-e 0.02] [-r 0.001]
                           [-R 0.15] [-X 0.25] [-D 500] [-s 50] [-S 0]
                           genome
            
            positional arguments:
              genome                the FASTA reference sequence
            
            optional arguments:
              -h, --help            show this help message and exit
              -a 1.fq, --r1 1.fq    name for first in pair
              -b 2.fq, --r2 2.fq    name for second in pair
              -N 1000, --num 1000   number of read pairs
              -f, --fixed           each chromosome gets N sequences
              -e 0.02, --err 0.02   the base error rate
              -r 0.001, --mut 0.001
                                    rate of mutations
              -R 0.15, --frac 0.15  fraction of indels
              -X 0.25, --ext 0.25   probability an indel is extended
              -D 500, --dist 500    outer distance between the two ends
              -s 50, --stdev 50     standard deviation
              -S 0, --seed 0        seed for the random generator
              
        ## API
        
        The interface to `wgsim` can be made in a single function call 
        
            from pywgsim import wgsim
        
            wgsim.core(r1="r1.fq", r2="r2.fq", ref="genome.fa", err_rate=0.02, mut_rate=0.001, indel_frac=0.15, indel_ext=0.25, max_n=0.05, is_hap=0, N=100000,  dist=500, stdev=50, size_l=100, size_r=100, is_fixed=0, seed=0)
        
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: POSIX
Classifier: Programming Language :: C
Classifier: Programming Language :: Cython
Requires-Python: >=3.6
Description-Content-Type: text/markdown
