{
 "cells": [
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "The mwtab Tutorial\n",
    "==================\n",
    "\n",
    "The :mod:`mwtab` package provides classes and other facilities for parsing,\n",
    "accessing, and manipulating data stored in ``mwTab`` and ``JSON`` representation\n",
    "of ``mwTab`` formats.\n",
    "\n",
    "Also, the :mod:`mwtab` package provides simple command-line interface to convert\n",
    "between ``mwTab`` and its ``JSON`` representation as well as validate consistency\n",
    "of the files."
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "Brief mwTab Format Overview\n",
    "~~~~~~~~~~~~~~~~~~~~~~~~~~~\n",
    "\n",
    "\n",
    ".. note::\n",
    "\n",
    "   For full official specification see the following link (``mwTab file specification``):\n",
    "   http://www.metabolomicsworkbench.org/data/tutorials.php\n",
    "\n",
    "\n",
    "The ``mwTab`` formatted files consist of multiple blocks. Each new block starts with ``#``.\n",
    "\n",
    "* Some of the blocks contain only \"key-value\"-like pairs.\n",
    "\n",
    ".. code-block:: none\n",
    "\n",
    "   #METABOLOMICS WORKBENCH STUDY_ID:ST000001 ANALYSIS_ID:AN000001\n",
    "   VERSION             \t1\n",
    "   CREATED_ON          \t2016-09-17\n",
    "   #PROJECT\n",
    "   PR:PROJECT_TITLE                 \tFatB Gene Project\n",
    "   PR:PROJECT_TYPE                  \tGenotype treatment\n",
    "   PR:PROJECT_SUMMARY               \tExperiment to test the consequence of a mutation at the FatB gene (At1g08510)\n",
    "   PR:PROJECT_SUMMARY               \tthe wound-response of Arabidopsis\n",
    "\n",
    ".. note::\n",
    "\n",
    "   ``*_SUMMARY`` \"key-value\"-like pairs are typically span through multiple lines.\n",
    "\n",
    "\n",
    "* ``#SUBJECT_SAMPLE_FACTORS`` block is specially formatted, i.e. it contains header\n",
    "  specification and tab-separated values.\n",
    "\n",
    ".. code-block:: none\n",
    "\n",
    "   #SUBJECT_SAMPLE_FACTORS:         \tSUBJECT(optional)[tab]SAMPLE[tab]FACTORS(NAME:VALUE pairs separated by |)[tab]Additional sample data\n",
    "   SUBJECT_SAMPLE_FACTORS           \t-\tLabF_115873\tArabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded\n",
    "   SUBJECT_SAMPLE_FACTORS           \t-\tLabF_115878\tArabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded\n",
    "   SUBJECT_SAMPLE_FACTORS           \t-\tLabF_115883\tArabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded\n",
    "   SUBJECT_SAMPLE_FACTORS           \t-\tLabF_115888\tArabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded\n",
    "   SUBJECT_SAMPLE_FACTORS           \t-\tLabF_115893\tArabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded\n",
    "   SUBJECT_SAMPLE_FACTORS           \t-\tLabF_115898\tArabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded\n",
    "\n",
    "\n",
    "* ``#MS_METABOLITE_DATA`` (results) block contains ``Samples`` identifiers, ``Factors`` identifiers\n",
    "  as well as tab-separated data between ``*_START`` and ``*_END``.\n",
    "\n",
    ".. code-block:: none\n",
    "\n",
    "   #MS_METABOLITE_DATA\n",
    "   MS_METABOLITE_DATA:UNITS\tPeak height\n",
    "   MS_METABOLITE_DATA_START\n",
    "   Samples\tLabF_115904\tLabF_115909\tLabF_115914\tLabF_115919\tLabF_115924\tLabF_115929\tLabF_115842\tLabF_115847\tLabF_115852\tLabF_115857\tLabF_115862\tLabF_115867\tLabF_115873\tLabF_115878\tLabF_115883\tLabF_115888\tLabF_115893\tLabF_115898\tLabF_115811\tLabF_115816\tLabF_115821\tLabF_115826\tLabF_115831\tLabF_115836\n",
    "   Factors\tArabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Control - Non-Wounded\tArabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Control - Non-Wounded\tArabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Control - Non-Wounded\tArabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Control - Non-Wounded\tArabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Control - Non-Wounded\tArabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Control - Non-Wounded\tArabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Wounded\tArabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Wounded\tArabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Wounded\tArabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Wounded\tArabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Wounded\tArabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Wounded\tArabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded\tArabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded\tArabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded\tArabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded\tArabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded\tArabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded\tArabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Wounded\tArabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Wounded\tArabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Wounded\tArabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Wounded\tArabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Wounded\tArabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Wounded\n",
    "   1_2_4-benzenetriol\t1874.0000\t3566.0000\t1945.0000\t1456.0000\t2004.0000\t1995.0000\t4040.0000\t2432.0000\t2189.0000\t1931.0000\t1307.0000\t2880.0000\t2218.0000\t1754.0000\t1369.0000\t1201.0000\t3324.0000\t1355.0000\t2257.0000\t1718.0000\t1740.0000\t3472.0000\t2054.0000\t1367.0000\n",
    "   1-monostearin\t987.0000\t450.0000\t1910.0000\t549.0000\t1032.0000\t902.0000\t393.0000\t705.0000\t100.0000\t481.0000\t265.0000\t120.0000\t1185.0000\t867.0000\t676.0000\t569.0000\t579.0000\t387.0000\t1035.0000\t789.0000\t875.0000\t224.0000\t641.0000\t693.0000\n",
    "   ...\n",
    "   MS_METABOLITE_DATA_END\n",
    "\n",
    "* ``#METABOLITES`` metadata block contains a header specifying fields and\n",
    "  tab-separated data between ``*_START`` and ``*_END``.\n",
    "\n",
    ".. code-block:: none\n",
    "\n",
    "   #METABOLITES\n",
    "   METABOLITES_START\n",
    "   metabolite_name\tmoverz_quant\tri\tri_type\tpubchem_id\tinchi_key\tkegg_id\tother_id\tother_id_type\n",
    "   1,2,4-benzenetriol\t239\t522741\tFiehn\t10787\t\tC02814\t205673\tBinBase\n",
    "   1-monostearin\t399\t959625\tFiehn\t107036\t\tD01947\t202835\tBinBase\n",
    "   2-hydroxyvaleric acid\t131\t310750\tFiehn\t98009\t\t\t218773\tBinBase\n",
    "   3-phosphoglycerate\t299\t611619\tFiehn\t724\t\tC00597\t217821\tBinBase\n",
    "   ...\n",
    "   METABOLITES_END\n",
    "\n",
    "* ``#NMR_BINNED_DATA`` metadata block contains a header specifying fields and\n",
    "  tab-separated data between ``*_START`` and ``*_END``.\n",
    "\n",
    ".. code-block:: none\n",
    "\n",
    "   #NMR_BINNED_DATA\n",
    "   NMR_BINNED_DATA_START\n",
    "   Bin range(ppm)\tCDC029\tCDC030\tCDC032\tCPL101\tCPL102\tCPL103\tCPL201\tCPL202\tCPL203\tCDS039\tCDS052\tCDS054\n",
    "   0.50...0.56\t0.00058149\t1.6592\t0.039301\t0\t0\t0\t0.034018\t0.0028746\t0.0021478\t0.013387\t0\t0\n",
    "   0.56...0.58\t0\t0.74267\t0\t0.007206\t0\t0\t0\t0\t0\t0\t0\t0.0069721\n",
    "   0.58...0.60\t0.051165\t0.8258\t0.089149\t0.060972\t0.026307\t0.045697\t0.069541\t0\t0\t0.14516\t0.057489\t0.042255\n",
    "   ...\n",
    "   NMR_BINNED_DATA_END\n",
    "\n",
    "* Order of metadata and data blocks (MS)\n",
    "\n",
    ".. code-block:: none\n",
    "\n",
    "   #METABOLOMICS WORKBENCH\n",
    "   VERSION             \t1\n",
    "   CREATED_ON          \t2016-09-17\n",
    "   ...\n",
    "   #PROJECT\n",
    "   ...\n",
    "   #STUDY\n",
    "   ...\n",
    "   #SUBJECT\n",
    "   ...\n",
    "   #SUBJECT_SAMPLE_FACTORS:         \tSUBJECT(optional)[tab]SAMPLE[tab]FACTORS(NAME:VALUE pairs separated by |)[tab]Additional sample data\n",
    "   ...\n",
    "   #COLLECTION\n",
    "   ...\n",
    "   #TREATMENT\n",
    "   ...\n",
    "   #SAMPLEPREP\n",
    "   ...\n",
    "   #CHROMATOGRAPHY\n",
    "   ...\n",
    "   #ANALYSIS\n",
    "   ...\n",
    "   #MS\n",
    "   ...\n",
    "   #MS_METABOLITE_DATA\n",
    "   MS_METABOLITE_DATA:UNITS\tpeak area\n",
    "   MS_METABOLITE_DATA_START\n",
    "   ...\n",
    "   MS_METABOLITE_DATA_END\n",
    "   #METABOLITES\n",
    "   METABOLITES_START\n",
    "   ...\n",
    "   METABOLITES_END\n",
    "   #END"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "Using mwtab as a Library\n",
    "~~~~~~~~~~~~~~~~~~~~~~~~\n",
    "\n",
    "\n",
    "Importing mwtab Package\n",
    "-----------------------\n",
    "\n",
    "If the :mod:`mwtab` package is installed on the system, it can be imported:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import mwtab"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "Constructing MWTabFile Generator\n",
    "--------------------------------\n",
    "\n",
    "The :mod:`~mwtab.fileio` module provides the :func:`~mwtab.fileio.read_files`\n",
    "generator function that yields :class:`~mwtab.mwtab.MWTabFile` instances. Constructing a\n",
    ":class:`~mwtab.mwtab.MWTabFile` generator is easy - specify the path to a local ``mwTab`` file,\n",
    "directory of files, archive of files:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import mwtab\n",
    "\n",
    "mwfile_gen = mwtab.read_files(\"ST000001_AN000001.txt\")  # single mwTab file\n",
    "mwfiles_gen = mwtab.read_files(\"ST000001_AN000001.txt\", \"ST000002_AN000002.txt\")  # several mwTab files\n",
    "mwdir_gen = mwtab.read_files(\"mwfiles_dir\")  # directory of mwTab files\n",
    "mwzip_gen = mwtab.read_files(\"mwfiles.zip\")  # archive of mwTab files\n",
    "mwurl_gen = mwtab.read_files(\"1\", \"2\")       # ANALYSIS_ID of mwTab file"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "Processing MWTabFile Generator\n",
    "------------------------------\n",
    "\n",
    "The :class:`~mwtab.mwtab.MWTabFile` generator can be processed in several ways:\n",
    "\n",
    "* Feed it to a for-loop and process one file at a time:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for mwfile in mwtab.read_files(\"1\", \"2\"):\n",
    "    print(\"STUDY_ID:\", mwfile.study_id)       # print STUDY_ID\n",
    "    print(\"ANALYSIS_ID\", mwfile.analysis_id)  # print ANALYSIS_ID\n",
    "    print(\"SOURCE\", mwfile.source)            # print source\n",
    "    for block_name in mwfile:                 # print names of blocks\n",
    "        print(\"\\t\", block_name)"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    ".. note:: Once the generator is consumed, it becomes empty and needs to be created again."
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "* Since the :class:`~mwtab.mwtab.MWTabFile` generator behaves like an iterator,\n",
    "  we can call the :py:func:`next` built-in function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mwfiles_generator = mwtab.read_files(\"1\", \"2\")\n",
    "\n",
    "mwfile1 = next(mwfiles_generator)\n",
    "mwfile2 = next(mwfiles_generator)"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    ".. note:: Once the generator is consumed, :py:class:`StopIteration` will be raised."
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "* Convert the :class:`~mwtab.mwtab.MWTabFile` generator into a :py:class:`list` of\n",
    "  :class:`~mwtab.mwtab.MWTabFile` objects:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mwfiles_generator = mwtab.read_files(\"1\", \"2\")\n",
    "mwfiles_list = list(mwfiles_generator)"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    ".. warning:: Do not convert the :class:`~mwtab.mwtab.MWTabFile` generator into a\n",
    "             :py:class:`list` if the generator can yield a large number of files, e.g.\n",
    "             several thousand, otherwise it can consume all available memory."
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "Accessing Data From a Single MWTabFile\n",
    "--------------------------------------\n",
    "\n",
    "Since a :class:`~mwtab.mwtab.MWTabFile` is a Python :py:class:`collections.OrderedDict`,\n",
    "data can be accessed and manipulated as with any regular Python :py:class:`dict` object\n",
    "using bracket accessors.\n",
    "\n",
    "* Accessing top-level \"keys\" in :class:`~mwtab.mwtab.MWTabFile`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "os.chdir('_static/mwfiles')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mwfile = next(mwtab.read_files(\"ST000002_AN000002.txt\"))\n",
    "\n",
    "# list MWTabFile-level keys, i.e. saveframe names\n",
    "list(mwfile.keys())"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "* Accessing individual blocks in :class:`~mwtab.mwtab.MWTabFile`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# access \"PROJECT\" block\n",
    "mwfile[\"PROJECT\"]"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "* Accessing individual \"key-value\" pairs within blocks:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# access \"INSTITUTE\" field within \"PROJECT\" block\n",
    "mwfile[\"PROJECT\"][\"INSTITUTE\"]"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "* Accessing data in ``#SUBJECT_SAMPLE_FACTORS`` block:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# access \"SUBJECT_SAMPLE_FACTORS\" block\n",
    "mwfile[\"SUBJECT_SAMPLE_FACTORS\"][\"SUBJECT_SAMPLE_FACTORS\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# access individual factors (by index)\n",
    "mwfile[\"SUBJECT_SAMPLE_FACTORS\"][\"SUBJECT_SAMPLE_FACTORS\"][0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# access individual fields within factors\n",
    "mwfile[\"SUBJECT_SAMPLE_FACTORS\"][\"SUBJECT_SAMPLE_FACTORS\"][0][\"local_sample_id\"]"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "* Accessing data in ``#MS_METABOLITE_DATA`` block:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# access entire block\n",
    "mwfile[\"MS_METABOLITE_DATA\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# access units field\n",
    "mwfile[\"MS_METABOLITE_DATA\"][\"MS_METABOLITE_DATA:UNITS\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# access samples field\n",
    "mwfile[\"MS_METABOLITE_DATA\"][\"MS_METABOLITE_DATA_START\"][\"Samples\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# access factors field\n",
    "mwfile[\"MS_METABOLITE_DATA\"][\"MS_METABOLITE_DATA_START\"][\"Factors\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# access metabolite data\n",
    "mwfile[\"MS_METABOLITE_DATA\"][\"MS_METABOLITE_DATA_START\"][\"DATA\"]"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "Manipulating Data From a Single MWTabFile\n",
    "-----------------------------------------\n",
    "\n",
    "In order to change values within :class:`~mwtab.mwtab.MWTabFile`, descend into\n",
    "the appropriate level using square bracket accessors and set a new value.\n",
    "\n",
    "* Change regular \"key-value\" pairs:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# access phone number information\n",
    "mwfile[\"PROJECT\"][\"PHONE\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# change phone number information\n",
    "mwfile[\"PROJECT\"][\"PHONE\"] = \"1-530-754-8258\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# check that it has been modified\n",
    "mwfile[\"PROJECT\"][\"PHONE\"]"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "* Change ``#SUBJECT_SAMPLE_FACTORS`` values:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# access the first subject sample factor by index\n",
    "mwfile[\"SUBJECT_SAMPLE_FACTORS\"][\"SUBJECT_SAMPLE_FACTORS\"][0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# provide additional details to the first subject sample factor\n",
    "mwfile[\"SUBJECT_SAMPLE_FACTORS\"][\"SUBJECT_SAMPLE_FACTORS\"][0][\"additional_sample_data\"] = \"Additional details\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# check that it has been modified\n",
    "mwfile[\"SUBJECT_SAMPLE_FACTORS\"][\"SUBJECT_SAMPLE_FACTORS\"][0]"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "Printing a MWTabFile and its Components\n",
    "---------------------------------------\n",
    "\n",
    "* Print entire file in ``mwTab`` format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mwfile.print_file(file_format=\"mwtab\")"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "* Print entire file in ``JSON`` format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mwfile.print_file(file_format=\"json\")"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "* Print single block in ``mwTab`` format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mwfile.print_block(\"STUDY\", file_format=\"mwtab\")"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "* Print single block in ``JSON`` format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mwfile.print_block(\"STUDY\", file_format=\"json\")"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "Writing data from a MWTabFile object into a file\n",
    "------------------------------------------------\n",
    "Data from a :class:`~mwtab.mwtab.MWTabFile` can be written into file\n",
    "in original ``mwTab`` format or in equivalent JSON format using\n",
    ":meth:`~mwtab.mwtab.MWTabFile.write()`:\n",
    "\n",
    "* Writing into a ``mwTab`` formatted file:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "with open(\"out/ST000001_AN000001_modified.txt\", \"w\") as outfile:\n",
    "    mwfile.write(outfile, file_format=\"mwtab\")"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "* Writing into a ``JSON`` file:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "with open(\"out/ST000001_AN000001_modified.json\", \"w\") as outfile:\n",
    "    mwfile.write(outfile, file_format=\"json\")"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "Converting mwTab Files\n",
    "----------------------\n",
    "\n",
    "``mwTab`` files can be converted between the ``mwTab`` file format and their ``JSON``\n",
    "representation using the :mod:`mwtab.converter` module.\n",
    "\n",
    "One-to-one file conversions\n",
    "***************************\n",
    "\n",
    "* Converting from the ``mwTab`` file format into its equivalent ``JSON`` file format:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from mwtab.converter import Converter\n",
    "\n",
    "# Using valid ANALYSIS_ID to access file from URL: from_path=\"1\"\n",
    "converter = Converter(from_path=\"1\", to_path=\"out/ST000001_AN000001.json\",\n",
    "                      from_format=\"mwtab\", to_format=\"json\")\n",
    "converter.convert()"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "* Converting from JSON file format back to ``mwTab`` file format:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from mwtab.converter import Converter\n",
    "\n",
    "converter = Converter(from_path=\"out/ST000001_AN000001.json\", to_path=\"out/ST000001_AN000001.txt\",\n",
    "                      from_format=\"json\", to_format=\"mwtab\")\n",
    "converter.convert()"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "Many-to-many files conversions\n",
    "******************************\n",
    "\n",
    "* Converting from the directory of ``mwTab`` formatted files into their equivalent\n",
    "  ``JSON`` formatted files:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from mwtab.converter import Converter\n",
    "\n",
    "converter = Converter(from_path=\"mwfiles_dir_mwtab\",\n",
    "                      to_path=\"out/mwfiles_dir_json\",\n",
    "                      from_format=\"mwtab\",\n",
    "                      to_format=\"json\")\n",
    "converter.convert()"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "* Converting from the directory of ``JSON`` formatted files into their equivalent\n",
    "  ``mwTab`` formatted files:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from mwtab.converter import Converter\n",
    "\n",
    "converter = Converter(from_path=\"out/mwfiles_dir_json\",\n",
    "                      to_path=\"out/mwfiles_dir_mwtab\",\n",
    "                      from_format=\"json\",\n",
    "                      to_format=\"mwtab\")\n",
    "converter.convert()"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    ".. note:: Many-to-many files and one-to-one file conversions are available.\n",
    "          See :mod:`mwtab.converter` for full list of available conversions."
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "Command-Line Interface\n",
    "~~~~~~~~~~~~~~~~~~~~~~\n",
    "\n",
    "The mwtab Command-Line Interface provides the following functionality:\n",
    "   * Convert from the ``mwTab`` file format into its equivalent ``JSON`` file format and vice versa.\n",
    "   * Validate the ``mwTab`` formatted file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! python3 -m mwtab --help"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "Converting ``mwTab`` files in bulk\n",
    "----------------------------------\n",
    "\n",
    "CLI one-to-one file conversions\n",
    "*******************************\n",
    "\n",
    "* Convert from a local file in ``mwTab`` format to a local file in ``JSON`` format:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! python3 -m mwtab convert ST000001_AN000001.txt out/ST000001_AN000001.json \\\n",
    "          --from-format=mwtab --to-format=json"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "* Convert from a local file in ``JSON`` format to a local file in ``mwTab`` format:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! python3 -m mwtab convert ST000001_AN000001.json out/ST000001_AN000001.txt \\\n",
    "          --from-format=json --to-format=mwtab"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "* Convert from a compressed local file in ``mwTab`` format to a compressed local file in ``JSON`` format:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! python3 -m mwtab convert ST000001_AN000001.txt.gz out/ST000001_AN000001.json.gz \\\n",
    "          --from-format=mwtab --to-format=json"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "* Convert from a compressed local file in ``JSON`` format to a compressed local file in ``mwTab`` format:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! python3 -m mwtab convert ST000001_AN000001.json.gz out/ST000001_AN000001.txt.gz \\\n",
    "          --from-format=json --to-format=mwtab"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "* Convert from a uncompressed URL file in ``mwTab`` format to a compressed local file in ``JSON`` format:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! python3 -m mwtab convert 1 out/ST000001_AN000001.json.bz2 \\\n",
    "          --from-format=mwtab --to-format=json"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    ".. note:: See :mod:`mwtab.converter` for full list of available conversions."
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "CLI Many-to-many files conversions\n",
    "**********************************\n",
    "\n",
    "* Convert from a directory of files in ``mwTab`` format to a directory of files in ``JSON`` format:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! python3 -m mwtab convert mwfiles_dir_mwtab out/mwfiles_dir_json \\\n",
    "          --from-format=mwtab --to-format=json"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "* Convert from a directory of files in ``JSON`` format to a directory of files in ``mwTab`` format:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! python3 -m mwtab convert mwfiles_dir_json out/mwfiles_dir_mwtab \\\n",
    "          --from-format=json --to-format=mwtab"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "* Convert from a directory of files in ``mwTab`` format to a zip archive of files in ``JSON`` format:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! python3 -m mwtab convert mwfiles_dir_mwtab out/mwfiles_json.zip \\\n",
    "          --from-format=mwtab --to-format=json"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "* Convert from a compressed tar archive of files in ``JSON`` format to a directory of files in ``mwTab`` format:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! python3 -m mwtab convert mwfiles_json.tar.gz out/mwfiles_dir_mwtab \\\n",
    "          --from-format=json --to-format=mwtab"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "* Convert from a zip archive of files in ``mwTab`` format to a compressed tar archive of files in ``JSON`` format:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! python3 -m mwtab convert mwfiles_mwtab.zip out/mwfiles_json.tar.bz2 \\\n",
    "          --from-format=mwtab --to-format=json"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    ".. note:: See :mod:`mwtab.converter` for full list of available conversions."
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    "Validating ``mwTab`` files\n",
    "--------------------------\n",
    "\n",
    "The :mod:`mwtab` package provides the :func:`~mwtab.validator.validate_file` function\n",
    "that can validate files based on a ``JSON`` schema definition. The :mod:`mwtab.mwschema`\n",
    "contains schema definitions for every block of ``mwTab`` formatted file, i.e.\n",
    "it lists the types of attributes (e.g. :py:class:`str` as well as specifies which keys are\n",
    "optional and which are required).\n",
    "\n",
    "* To validate file(s) simply call the ``validate`` command and provide path to file(s):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! python3 -m mwtab validate 1"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Raw Cell Format",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
