API Tutorial: Full pyInfinityFlow Pipeline¶
This notebook can be downloaded here.
This tutorial uses the pyInfinityFlow API to carry out the full analysis pipeline with an example dataset. This example is a subset of the previously published mouse lung dataset[1], the full data set was made publicly available here in flowrepository.org. You can download the subset with the pyInfinityFlow repository on GitHub, which consists of 10 InfinityMarkers and 5 Isotype controls located in the ‘example_dataset’ directory. This directory also contains the relevant InfinityMarker annotation file as well as the Backbone annotation file, which are necessary for the analysis pipeline.
You can download the repository after Git has been installed by changing directories to where you want to install it and by using the following command:
git clone https://github.com/KyleFerchen/pyInfinityFlow.git
Provide paths for your machine¶
After you have installed the GitHub repository, you can add the path to the repository below to run this Notebook on your machine:
[ ]:
# Specify the path to the repository
path_to_repo = "/media/kyle_ssd1/Repositories/pyInfinityFlow/"
# Specify which directory on your machine you want to save the results
my_output_dir = "/media/kyle_ssd1/outputs/"
Step 1: Preparing the Inputs¶
Backbone Annotation File¶
First, we need to locate the Backbone annotation file. This will instruct the program which channel names in the input FCS files to use as the Backbone (predictors in the regression model). This is simply a .csv or .tsv file with three columns (in the same order as below) to annotate:
The channel names in the reference FCS file(s)(the data we use to build the final InfinityFlow object)
The channel names in the InfinityMarker FCS files (the data use to fit and validate the models)
The final name to use for the channel in the InfinityFlow object
This file should have the column names as the first line.
After downloading the pyInfinityFlow package repository on GitHub, we can access an example file for this test dataset, Eg.:
'/media/kyle_ssd1/Repositories/pyInfinityFlow/example_data/mouse_lung_dataset_subset_backbone_anno.csv'
The pyInfinityFlow.InfinityFlow_Utilities module provides a simple function (read_annotation_table) to read either a .csv, .tsv, or .txt (tab-delimited) file into a pandas.DataFrame object:
[1]:
import os
from pyInfinityFlow import InfinityFlow_Utilities
# PROVIDE THE PATH TO WHERE YOU DOWNLOADED THE REPOSITORY
path_backbone = os.path.join(path_to_repo, "example_data/mouse_lung_dataset_subset_backbone_anno.csv")
backbone_anno = InfinityFlow_Utilities.read_annotation_table(path_backbone)
backbone_anno
[1]:
| Reference_Backbone | Query_Backbone | Final_Name | |
|---|---|---|---|
| 0 | FJComp-APC-A | FJComp-APC-A | CD69-CD301b |
| 1 | FJComp-AlexaFluor700-A | FJComp-AlexaFluor700-A | MHCII |
| 2 | FJComp-BUV395-A | FJComp-BUV395-A | CD4 |
| 3 | FJComp-BUV737-A | FJComp-BUV737-A | CD44 |
| 4 | FJComp-BV421-A | FJComp-BV421-A | CD8 |
| 5 | FJComp-BV510-A | FJComp-BV510-A | CD11c |
| 6 | FJComp-BV605-A | FJComp-BV605-A | CD11b |
| 7 | FJComp-BV650-A | FJComp-BV650-A | F480 |
| 8 | FJComp-BV711-A | FJComp-BV711-A | Ly6C |
| 9 | FJComp-BV786-A | FJComp-BV786-A | Lineage |
| 10 | FJComp-GFP-A | FJComp-GFP-A | CD45a488 |
| 11 | FJComp-PE-Cy7(yg)-A | FJComp-PE-Cy7(yg)-A | CD24 |
| 12 | FJComp-PerCP-Cy5-5-A | FJComp-PerCP-Cy5-5-A | CD103 |
InfinityMarker Annotation File¶
The InfinityMarker annotation file specifies what FCS files to use to build the regression models and how they should be treated. Each InfinityMarker (Flow Cytometry signal to impute using the backbone) has a row entry in this annotation file for the following columns:
The FCS file name
The InfinityMarker channel name (exactly as it appears in the FCS file)
The name to give the channel in the final InfinityFlow object
(OPTIONAL) The final name of Isotype InfinityMarker (should be an entry in the third column for the InfinityMarkers that are Isotype controls)
This file is included in the same directory as the Backbone annotation file in the GitHub repository, Eg.:
'/media/kyle_ssd1/Repositories/pyInfinityFlow/example_data/mouse_lung_dataset_subset_infinity_marker_anno.csv'
Isotype background correction is an optional step in which a linear model is used to regress out the background binding and fluorescence of an antibody raised with a specific immunoglobulin. You can read more about it from the original publication. The InfinityMarker annotation file is used to specify whether or not to perform background correction. This is optional and will only be attempted in the pipeline if this annotation file has a 4th column.
The InfinityMarker annotation file, like the Backbone annotation file, is expected to be either a .csv, .tsv, or .txt (tab-delimited) file, and can also be read into a pandas.DataFrame using the read_annotation_table function:
[2]:
path_infmarker = os.path.join(path_to_repo,
"example_data/mouse_lung_dataset_subset_infinity_marker_anno.csv")
infinitymarker_anno = InfinityFlow_Utilities.read_annotation_table(path_infmarker)
infinitymarker_anno
[2]:
| File | Channel | Name | Isotype | |
|---|---|---|---|---|
| 0 | backbone_Plate2_Specimen_001_G1_G01_073_target... | FJComp-PE(yg)-A | 33D1 | Isotype_rIgG2b |
| 1 | backbone_Plate2_Specimen_001_F7_F07_067_target... | FJComp-PE(yg)-A | Allergin-1 | Isotype_mIgG1 |
| 2 | backbone_Plate2_Specimen_001_F8_F08_068_target... | FJComp-PE(yg)-A | B7-H4 | Isotype_AHIgG |
| 3 | backbone_Plate1_Specimen_001_A2_A02_002_target... | FJComp-PE(yg)-A | CD1d | Isotype_rIgG2b |
| 4 | backbone_Plate1_Specimen_001_G4_G04_076_target... | FJComp-PE(yg)-A | CD103 | Isotype_AHIgG |
| 5 | backbone_Plate1_Specimen_001_G5_G05_077_target... | FJComp-PE(yg)-A | CD105 | Isotype_rIgG2a |
| 6 | backbone_Plate1_Specimen_001_G6_G06_078_target... | FJComp-PE(yg)-A | CD106 | Isotype_rIgG2a |
| 7 | backbone_Plate1_Specimen_001_G7_G07_079_target... | FJComp-PE(yg)-A | CD107a (Lamp-1) | Isotype_rIgG2a |
| 8 | backbone_Plate1_Specimen_001_G8_G08_080_target... | FJComp-PE(yg)-A | CD107b (Mac-3) | Isotype_rIgG1 |
| 9 | backbone_Plate1_Specimen_001_G9_G09_081_target... | FJComp-PE(yg)-A | CD115 | Isotype_rIgG2a |
| 10 | backbone_Plate3_Specimen_001_F12_F12_072_targe... | FJComp-PE(yg)-A | Isotype_rIgG2b | Isotype_rIgG2b |
| 11 | backbone_Plate3_Specimen_001_F6_F06_066_target... | FJComp-PE(yg)-A | Isotype_mIgG1 | Isotype_mIgG1 |
| 12 | backbone_Plate3_Specimen_001_F4_F04_064_target... | FJComp-PE(yg)-A | Isotype_AHIgG | Isotype_AHIgG |
| 13 | backbone_Plate3_Specimen_001_F11_F11_071_targe... | FJComp-PE(yg)-A | Isotype_rIgG2a | Isotype_rIgG2a |
| 14 | backbone_Plate3_Specimen_001_F10_F10_070_targe... | FJComp-PE(yg)-A | Isotype_rIgG1 | Isotype_rIgG1 |
Step 2: Checking the Inputs and Building an InfinityFlowFileHandler¶
Next, we need to specify the directory in which the FCS files are saved. This directory is located in the same parent directory as the annotation files on the pyInfinityFlow GitHub repository:
'/media/kyle_ssd1/Repositories/pyInfinityFlow/example_data/mouse_lung_dataset_subset'
Then we can use the check_infinity_flow_annotation_dataframes to do the following:
Validate the input annotation DataFrames
Scan through the InfinityMarker FCS files to split events into training/validation/pooling subsets
Return an InfinityFlowFileHandler to store how each of the InfinityMarker files will be processed
Here we will use the n_events_combine parameter to pool events from each of the individual InfinityMarker files for the final InfinityFlow object. Each of original channels from this file will be preserved into the final InfinityFlow object.
Note: it is also possible to use the separate_backbone_reference argument to supply a separate FCS file onto which the predictions will be made. This is useful if there is a feature(s) that is not well explained by the Backbone channels and therefore should not be imputed.
[3]:
fcs_dir = os.path.join(path_to_repo,
"example_data/mouse_lung_dataset_subset")
file_handler = InfinityFlow_Utilities.check_infinity_flow_annotation_dataframes(\
backbone_annotation=backbone_anno,
infinity_marker_annotation=infinitymarker_anno,
n_events_train=0, # Use all possible events in the FCS file
n_events_validate=0, # Use all possible events in the FCS file
ratio_for_validation=0.5,
n_events_combine=1000, # Events to pool into a final InfinityFlow object
input_fcs_dir=fcs_dir,
verbosity=1)
file_handler
Isotype controls detected. Will attempt to use background correction...
[3]:
InfinityFlowFileHandler Object from pyInfinityFlow
.handles the following InfinityMarkers:
33D1
Allergin-1
B7-H4
CD1d
CD103
CD105
CD106
CD107a (Lamp-1)
CD107b (Mac-3)
CD115
Isotype_rIgG2b
Isotype_mIgG1
Isotype_AHIgG
Isotype_rIgG2a
Isotype_rIgG1
Held in the InfinityFlowFileHandler.handles dictionary
InfinityFlowFileHandler.list_infinity_markers holds ordered list of InfinityMarkers
For example, you can see how the InfinityMarker “33D1” is stored in the file_handler.handles dictionary, including the name, file_name, directory, reference_backbone_channels, backbone_channels, prediction_channel, train_indices, test_indices, and pool_indices.
This information will be used later on to carry out XGBoost regression.
[4]:
file_handler.handles["33D1"]
[4]:
{'name': '33D1',
'file_name': 'backbone_Plate2_Specimen_001_G1_G01_073_target_33D1.fcs',
'directory': '/media/kyle_ssd1/Repositories/pyInfinityFlow/example_data/mouse_lung_dataset_subset',
'reference_backbone_channels': array(['FJComp-APC-A', 'FJComp-AlexaFluor700-A', 'FJComp-BUV395-A',
'FJComp-BUV737-A', 'FJComp-BV421-A', 'FJComp-BV510-A',
'FJComp-BV605-A', 'FJComp-BV650-A', 'FJComp-BV711-A',
'FJComp-BV786-A', 'FJComp-GFP-A', 'FJComp-PE-Cy7(yg)-A',
'FJComp-PerCP-Cy5-5-A'], dtype=object),
'backbone_channels': array(['FJComp-APC-A', 'FJComp-AlexaFluor700-A', 'FJComp-BUV395-A',
'FJComp-BUV737-A', 'FJComp-BV421-A', 'FJComp-BV510-A',
'FJComp-BV605-A', 'FJComp-BV650-A', 'FJComp-BV711-A',
'FJComp-BV786-A', 'FJComp-GFP-A', 'FJComp-PE-Cy7(yg)-A',
'FJComp-PerCP-Cy5-5-A'], dtype=object),
'prediction_channel': 'FJComp-PE(yg)-A',
'train_indices': array([ 0, 1, 2, ..., 106341, 106343, 106345]),
'test_indices': array([ 5, 7, 9, ..., 106342, 106344, 106346]),
'pool_indices': array([ 5, 113, 137, 375, 430, 474, 527, 709,
914, 930, 1006, 1026, 1184, 1229, 1230, 1315,
1449, 1867, 2042, 2122, 2287, 2293, 2363, 2397,
2519, 2566, 2657, 2847, 2963, 3030, 3383, 3484,
3721, 3769, 4125, 4788, 4899, 4909, 4913, 5025,
5032, 5034, 5053, 5065, 5151, 5359, 5409, 5434,
5565, 5601, 5608, 5612, 5677, 6171, 6180, 6203,
6334, 6364, 6447, 6456, 6639, 6641, 6745, 6767,
6872, 7009, 7048, 7083, 7103, 7104, 7421, 7662,
7668, 7782, 7863, 7872, 7961, 8111, 8142, 8345,
8373, 8374, 8390, 8419, 8455, 8496, 8575, 8596,
8688, 8753, 8792, 8862, 8967, 8984, 9119, 9149,
9160, 9638, 9775, 9863, 9902, 9940, 10063, 10090,
10093, 10316, 10558, 10564, 10686, 10842, 10910, 10969,
11016, 11027, 11039, 11108, 11123, 11211, 11340, 11610,
11642, 11698, 11709, 11734, 11750, 11768, 12101, 12356,
12621, 12676, 12751, 12812, 12934, 12976, 12993, 13005,
13010, 13135, 13224, 13278, 13345, 13462, 13566, 13768,
13820, 13844, 13866, 13888, 13961, 14266, 14487, 14800,
14918, 15042, 15404, 15442, 15508, 15525, 15725, 15816,
15848, 15893, 15938, 16336, 16361, 16391, 16423, 16475,
16578, 16771, 16834, 17099, 17131, 17283, 17496, 17603,
17639, 17678, 17705, 17764, 18073, 18080, 18109, 18313,
18377, 18470, 18484, 19001, 19067, 19093, 19141, 19194,
19206, 19218, 19536, 19643, 19712, 19953, 19975, 19996,
20260, 20318, 20348, 20432, 20501, 20613, 20641, 20712,
20789, 20891, 21029, 21037, 21092, 21159, 21338, 21363,
21503, 21733, 21803, 21913, 22288, 22476, 22674, 22817,
22946, 23023, 23159, 23444, 23695, 23707, 23810, 23882,
24021, 24039, 24042, 24194, 24221, 24381, 24400, 24563,
24614, 24897, 25054, 25148, 25182, 25204, 25458, 25482,
25496, 25726, 25875, 25901, 25957, 26197, 26759, 26855,
26877, 26883, 27101, 27120, 27246, 27398, 27501, 27521,
27524, 27792, 27842, 27867, 27896, 27898, 27901, 28170,
28357, 28444, 28619, 28687, 28778, 28894, 28984, 29008,
29047, 29092, 29187, 29263, 29306, 29386, 29518, 29646,
29734, 30043, 30091, 30095, 30122, 30213, 30420, 30658,
31065, 31079, 31082, 31396, 31473, 31478, 31519, 31663,
31698, 31922, 31971, 32165, 32238, 32302, 32341, 32459,
32592, 32767, 32776, 32951, 33038, 33377, 33418, 33560,
33735, 33744, 33996, 34037, 34069, 34071, 34252, 34344,
34388, 34533, 34640, 34717, 34758, 34993, 35056, 35170,
35214, 35245, 35298, 35427, 35438, 35546, 35757, 35830,
36162, 36173, 36203, 36345, 36420, 36481, 36598, 36787,
36822, 36840, 36846, 36992, 36996, 37102, 37154, 37238,
37500, 37520, 37620, 37669, 37715, 38021, 38082, 38170,
38371, 38408, 38438, 38457, 38477, 38579, 38954, 38956,
39064, 39188, 39201, 39327, 39669, 39752, 39807, 39910,
39957, 39999, 40082, 40106, 40158, 40287, 40297, 40514,
40583, 40584, 40723, 40811, 41199, 41257, 41374, 41590,
41613, 41638, 41658, 41887, 41901, 41916, 41966, 42098,
42249, 42285, 42493, 42557, 42565, 42665, 42673, 42739,
42796, 43117, 43287, 43362, 43377, 43667, 43707, 43731,
43758, 43795, 43931, 43981, 43989, 44042, 44064, 44699,
44740, 44895, 44913, 44986, 45058, 45168, 45274, 45362,
45364, 45461, 45508, 45566, 45826, 45998, 46319, 46332,
46366, 46505, 46524, 46698, 46797, 46898, 46968, 47453,
47494, 47631, 47683, 47693, 47747, 47882, 48119, 48162,
48170, 48218, 48306, 48325, 48374, 48488, 48530, 48549,
48560, 48618, 48702, 48714, 48745, 49034, 49084, 49114,
49180, 49243, 49246, 49309, 49349, 49433, 49434, 49506,
49575, 49779, 49846, 49910, 49966, 49973, 50174, 50482,
50647, 50679, 50728, 50730, 50808, 50910, 50998, 51184,
51299, 51431, 51557, 51655, 51674, 51682, 51858, 51940,
51941, 51960, 51982, 52153, 52225, 52318, 52333, 52613,
52692, 52850, 52870, 53000, 53064, 53136, 53193, 53197,
53279, 53323, 53355, 53410, 53463, 53686, 53713, 54210,
54381, 54382, 54537, 54594, 54688, 54929, 55316, 55389,
55501, 55509, 55564, 55652, 55667, 55765, 55888, 56108,
56256, 56408, 56478, 56620, 56935, 57082, 57131, 57477,
57593, 57635, 57649, 57671, 57902, 57996, 58103, 58214,
58230, 58288, 58367, 58693, 58749, 58822, 59031, 59053,
59132, 59164, 59201, 59308, 59532, 59759, 59871, 59935,
59991, 60169, 60379, 60465, 60483, 60550, 60666, 60670,
60732, 60757, 61152, 61168, 61192, 61254, 61530, 61767,
61905, 61931, 62066, 62108, 62545, 62752, 62939, 63084,
63200, 63219, 63253, 63290, 63330, 63368, 63682, 63801,
63874, 63933, 63963, 64035, 64036, 64131, 64164, 64350,
64598, 64661, 64663, 64701, 64934, 64947, 65118, 65286,
65330, 65401, 65657, 65661, 66020, 66037, 66263, 66747,
66769, 66780, 66906, 67009, 67038, 67124, 67192, 67357,
67379, 67389, 67446, 67494, 67612, 67712, 67843, 67892,
67927, 67940, 68148, 68172, 68266, 68286, 68349, 68414,
68601, 68607, 68653, 68747, 68803, 68865, 68903, 68909,
69207, 69236, 69370, 69394, 69411, 69502, 69619, 69720,
69833, 69885, 69956, 70023, 70056, 70296, 70465, 70589,
70641, 70686, 71028, 71113, 71180, 71200, 71203, 71587,
71649, 71676, 72106, 72159, 72200, 72261, 72271, 72313,
72537, 72557, 72853, 73155, 73275, 73504, 73812, 74128,
74181, 74430, 74627, 74801, 74824, 74909, 75083, 75210,
75801, 75828, 75842, 75991, 76442, 76644, 76776, 76971,
76986, 77154, 77396, 77775, 77800, 77803, 77853, 78191,
78270, 78439, 78594, 78611, 78618, 78696, 78911, 78928,
79016, 79090, 79175, 79313, 79438, 79541, 79649, 79783,
80150, 80173, 80491, 80647, 80718, 80725, 80860, 80869,
80891, 80944, 80958, 81008, 81015, 81020, 81111, 81271,
81313, 81386, 81419, 81586, 81622, 81670, 81791, 81807,
81988, 82060, 82092, 82336, 82516, 82523, 82526, 82541,
82554, 82564, 82630, 82636, 82687, 82750, 82918, 82966,
83507, 84850, 84952, 84964, 85136, 85222, 85231, 85388,
85446, 85529, 85850, 85883, 85952, 86032, 86096, 86169,
86188, 86352, 86472, 86712, 86713, 86823, 86938, 87129,
87309, 87413, 87529, 87723, 88053, 88087, 88090, 88157,
88161, 88258, 88265, 88273, 88572, 88754, 88780, 88821,
88915, 88965, 88988, 88998, 89040, 89062, 89245, 89279,
89633, 89647, 89827, 89941, 90003, 90034, 90256, 90414,
90583, 90587, 90885, 91190, 91352, 91410, 91502, 91599,
91685, 91707, 91711, 91729, 91894, 92030, 92081, 92277,
92302, 92386, 92476, 92516, 92755, 92879, 92959, 93072,
93115, 93121, 93276, 93646, 93790, 93837, 93852, 94023,
94026, 94229, 94267, 94491, 94588, 94771, 94899, 94916,
95014, 95079, 95093, 95100, 95220, 95242, 95314, 95328,
95410, 95608, 95668, 95683, 95776, 95853, 95891, 95978,
96222, 96225, 96242, 96299, 96349, 96417, 96508, 96613,
96895, 96917, 96947, 96979, 97178, 97287, 97324, 97341,
97457, 97531, 97542, 97575, 97682, 97719, 97769, 97836,
97976, 98127, 98328, 98417, 98436, 98446, 98518, 98674,
98721, 98857, 98886, 98910, 99168, 99190, 99206, 99320,
99362, 99703, 99845, 99883, 99902, 99904, 99932, 99956,
99978, 100203, 100330, 100416, 100450, 100589, 100685, 100735,
100755, 100761, 100803, 100909, 100916, 101335, 101993, 102036,
102204, 102233, 102292, 102345, 102459, 102661, 102672, 102776,
102835, 102841, 102862, 102989, 102993, 103183, 103375, 103732,
103798, 103876, 104273, 104297, 104317, 104324, 104685, 105143,
105145, 105193, 105404, 105481, 105886, 105975, 105979, 106099])}
Step 3: Specify Output Directories¶
Here, we simply need to specify a directory in which to save the outputs of the pipeline. The InfinityFlow_Utilities.setup_output_directories function will prepare a dictionary that stores where to save different outputs, and create those directories:
[5]:
output_paths = InfinityFlow_Utilities.setup_output_directories(\
output_dir=my_output_dir,
file_handler=file_handler,
verbosity=1)
output_paths
[5]:
{'output_regression_path': '/media/kyle_ssd1/outputs/regression_results',
'output_umap_feature_plot_path': '/media/kyle_ssd1/outputs/umap_feature_plots',
'clustering': '/media/kyle_ssd1/outputs/clustering',
'qc': '/media/kyle_ssd1/outputs/QC',
'output_umap_bc_feature_plot_path': '/media/kyle_ssd1/outputs/umap_feature_plots_background_corrected'}
Step 4: Fitting the XGBoost Regression Models¶
The InfinityFlow_Utilities.single_chunk_training function is used to create and fit the XGBoost models. It will return a tuple consisting of a InfinityFlow_Utilities.CombinedRegressionModels object and a dictionary that saves how much time it took to fit the models for the InfinityMarkers.
[6]:
regression_models, timings_1 = InfinityFlow_Utilities.single_chunk_training(\
file_handler=file_handler,
cores_to_use=12,
use_logicle_scaling=True,
normalization_method=None,
verbosity=3)
Reading in data from .fcs files for model training...
DEBUG: Reading in the data for InfinityMarker 33D1...
DEBUG: Reading in the data for InfinityMarker Allergin-1...
DEBUG: Reading in the data for InfinityMarker B7-H4...
DEBUG: Reading in the data for InfinityMarker CD1d...
DEBUG: Reading in the data for InfinityMarker CD103...
DEBUG: Reading in the data for InfinityMarker CD105...
DEBUG: Reading in the data for InfinityMarker CD106...
DEBUG: Reading in the data for InfinityMarker CD107a (Lamp-1)...
DEBUG: Reading in the data for InfinityMarker CD107b (Mac-3)...
DEBUG: Reading in the data for InfinityMarker CD115...
DEBUG: Reading in the data for InfinityMarker Isotype_rIgG2b...
DEBUG: Reading in the data for InfinityMarker Isotype_mIgG1...
DEBUG: Reading in the data for InfinityMarker Isotype_AHIgG...
DEBUG: Reading in the data for InfinityMarker Isotype_rIgG2a...
DEBUG: Reading in the data for InfinityMarker Isotype_rIgG1...
Applying Logicle normalization to data...
Building regression model for 33D1...
DEBUG: Setting n_jobs to 12 and random_state to None
DEBUG: XGBoost regression model trained in 4.51 seconds.
Building regression model for Allergin-1...
DEBUG: Setting n_jobs to 12 and random_state to None
DEBUG: XGBoost regression model trained in 4.49 seconds.
Building regression model for B7-H4...
DEBUG: Setting n_jobs to 12 and random_state to None
DEBUG: XGBoost regression model trained in 3.65 seconds.
Building regression model for CD1d...
DEBUG: Setting n_jobs to 12 and random_state to None
DEBUG: XGBoost regression model trained in 6.50 seconds.
Building regression model for CD103...
DEBUG: Setting n_jobs to 12 and random_state to None
DEBUG: XGBoost regression model trained in 5.05 seconds.
Building regression model for CD105...
DEBUG: Setting n_jobs to 12 and random_state to None
DEBUG: XGBoost regression model trained in 3.90 seconds.
Building regression model for CD106...
DEBUG: Setting n_jobs to 12 and random_state to None
DEBUG: XGBoost regression model trained in 3.28 seconds.
Building regression model for CD107a (Lamp-1)...
DEBUG: Setting n_jobs to 12 and random_state to None
DEBUG: XGBoost regression model trained in 6.44 seconds.
Building regression model for CD107b (Mac-3)...
DEBUG: Setting n_jobs to 12 and random_state to None
DEBUG: XGBoost regression model trained in 4.62 seconds.
Building regression model for CD115...
DEBUG: Setting n_jobs to 12 and random_state to None
DEBUG: XGBoost regression model trained in 3.80 seconds.
Building regression model for Isotype_rIgG2b...
DEBUG: Setting n_jobs to 12 and random_state to None
DEBUG: XGBoost regression model trained in 3.79 seconds.
Building regression model for Isotype_mIgG1...
DEBUG: Setting n_jobs to 12 and random_state to None
DEBUG: XGBoost regression model trained in 3.76 seconds.
Building regression model for Isotype_AHIgG...
DEBUG: Setting n_jobs to 12 and random_state to None
DEBUG: XGBoost regression model trained in 3.75 seconds.
Building regression model for Isotype_rIgG2a...
DEBUG: Setting n_jobs to 12 and random_state to None
DEBUG: XGBoost regression model trained in 3.71 seconds.
Building regression model for Isotype_rIgG1...
DEBUG: Setting n_jobs to 12 and random_state to None
DEBUG: XGBoost regression model trained in 3.46 seconds.
[7]:
regression_models
[7]:
CombinedRegressionModels Object from pyInfinityFlow
Contains regression models for the following InfinityMarkers (Response Variables):
33D1,Allergin-1,B7-H4,CD1d,CD103,CD105,CD106,CD107a (Lamp-1),CD107b (Mac-3),CD115,Isotype_rIgG2b,Isotype_mIgG1,Isotype_AHIgG,Isotype_rIgG2a,Isotype_rIgG1
Uses the following backbone (Explanatory Variables):
FJComp-APC-A,FJComp-AlexaFluor700-A,FJComp-BUV395-A,FJComp-BUV737-A,FJComp-BV421-A,FJComp-BV510-A,FJComp-BV605-A,FJComp-BV650-A,FJComp-BV711-A,FJComp-BV786-A,FJComp-GFP-A,FJComp-PE-Cy7(yg)-A,FJComp-PerCP-Cy5-5-A
The object holds the following variables:
ordered_training_channels
var_annotations
infinity_markers
regression_models
parameter_annotations
infinity_channels
validation_metrics
Access regression models as dictionary with the InfinityMarker as the key:
Eg. CombinedRegressionModels.regression_models["33D1"]
Step 5: Validating Regression Models¶
We can next use held out data from each of the InfinityMarker FCS files to score how well each of the models is able to impute the InfinityMarker expression values with the Backbone features. This is done with the InfinityFlow_Utilities.single_chunk_testing function. This will return a tuple with an updated CombinedRegressionModels object that contains validation metrics, and a dictionary to track the timing of the validation.
[8]:
regression_models, timings_2 = InfinityFlow_Utilities.single_chunk_testing(\
file_handler = file_handler,
regression_models = regression_models,
use_logicle_scaling=True,
normalization_method=None,
verbosity=3)
Reading in data from .fcs files for model validation...
DEBUG: Reading in the data for InfinityMarker 33D1...
DEBUG: Reading in the data for InfinityMarker Allergin-1...
DEBUG: Reading in the data for InfinityMarker B7-H4...
DEBUG: Reading in the data for InfinityMarker CD1d...
DEBUG: Reading in the data for InfinityMarker CD103...
DEBUG: Reading in the data for InfinityMarker CD105...
DEBUG: Reading in the data for InfinityMarker CD106...
DEBUG: Reading in the data for InfinityMarker CD107a (Lamp-1)...
DEBUG: Reading in the data for InfinityMarker CD107b (Mac-3)...
DEBUG: Reading in the data for InfinityMarker CD115...
DEBUG: Reading in the data for InfinityMarker Isotype_rIgG2b...
DEBUG: Reading in the data for InfinityMarker Isotype_mIgG1...
DEBUG: Reading in the data for InfinityMarker Isotype_AHIgG...
DEBUG: Reading in the data for InfinityMarker Isotype_rIgG2a...
DEBUG: Reading in the data for InfinityMarker Isotype_rIgG1...
Applying Logicle normalization to data...
Obtaining validation metrics for regression models...
Working on 33D1...
Working on Allergin-1...
Working on B7-H4...
Working on CD1d...
Working on CD103...
Working on CD105...
Working on CD106...
Working on CD107a (Lamp-1)...
Working on CD107b (Mac-3)...
Working on CD115...
Working on Isotype_rIgG2b...
Working on Isotype_mIgG1...
Working on Isotype_AHIgG...
Working on Isotype_rIgG2a...
Working on Isotype_rIgG1...
The single_chunk_testing function will set a dictionary to the validation_metrics attribute of the CombinedRegressionModels object. For each InfinityMarker name as a key, a dictionary is stored as the value with the predicted values, ture values, r2_score, and mean_squared_error:
[9]:
regression_models.validation_metrics
[9]:
{'33D1': {'pred': array([0.25200623, 0.2564635 , 0.25081083, ..., 0.25893834, 0.23685327,
0.25687218], dtype=float32),
'true': array([0.25880677, 0.24245095, 0.25493586, ..., 0.274207 , 0.21523006,
0.24371576], dtype=float32),
'r2_score': 0.19209259889025,
'mean_squared_error': 0.0001764585},
'Allergin-1': {'pred': array([0.21301892, 0.21872343, 0.20963453, ..., 0.21750394, 0.22308473,
0.21142632], dtype=float32),
'true': array([0.26102197, 0.2454918 , 0.24498476, ..., 0.21617137, 0.22679122,
0.23546468], dtype=float32),
'r2_score': 0.5829876350462362,
'mean_squared_error': 0.00061123044},
'B7-H4': {'pred': array([0.22862323, 0.2501482 , 0.25119272, ..., 0.2621282 , 0.2515326 ,
0.24431813], dtype=float32),
'true': array([0.2212211 , 0.22892201, 0.24515927, ..., 0.25912946, 0.23800558,
0.23579723], dtype=float32),
'r2_score': 0.17660769306807023,
'mean_squared_error': 0.00022054944},
'CD1d': {'pred': array([0.27452454, 0.27784905, 0.3807743 , ..., 0.26960534, 0.27109462,
0.26869002], dtype=float32),
'true': array([0.29084668, 0.2787824 , 0.37280142, ..., 0.2871778 , 0.2708428 ,
0.25334674], dtype=float32),
'r2_score': 0.6454607421084098,
'mean_squared_error': 0.0010725937},
'CD103': {'pred': array([0.25955388, 0.25041333, 0.2822778 , ..., 0.2555274 , 0.23480846,
0.23861615], dtype=float32),
'true': array([0.26624402, 0.21703161, 0.25556582, ..., 0.24584499, 0.23774952,
0.26314855], dtype=float32),
'r2_score': 0.6333754867688306,
'mean_squared_error': 0.00029186436},
'CD105': {'pred': array([0.25279748, 0.26322943, 0.33079505, ..., 0.2977552 , 0.24908838,
0.23428737], dtype=float32),
'true': array([0.25490686, 0.27640045, 0.3742214 , ..., 0.2684391 , 0.24985689,
0.21935181], dtype=float32),
'r2_score': 0.6753359632935965,
'mean_squared_error': 0.0004727315},
'CD106': {'pred': array([0.24811669, 0.24805331, 0.25124604, ..., 0.26913488, 0.24582317,
0.24637443], dtype=float32),
'true': array([0.25405908, 0.2475744 , 0.24266441, ..., 0.25619355, 0.2635005 ,
0.24447615], dtype=float32),
'r2_score': 0.3221144778474361,
'mean_squared_error': 0.00039838598},
'CD107a (Lamp-1)': {'pred': array([0.25488272, 0.2471615 , 0.5623303 , ..., 0.25523907, 0.25942624,
0.25205564], dtype=float32),
'true': array([0.2673547 , 0.25526446, 0.58498925, ..., 0.23471874, 0.26126346,
0.24185239], dtype=float32),
'r2_score': 0.8204075049850199,
'mean_squared_error': 0.0018579975},
'CD107b (Mac-3)': {'pred': array([0.26830757, 0.28503788, 0.2507204 , ..., 0.24908292, 0.25389788,
0.24230753], dtype=float32),
'true': array([0.29518053, 0.28111485, 0.255299 , ..., 0.2436143 , 0.2626211 ,
0.2382466 ], dtype=float32),
'r2_score': 0.7183401629602935,
'mean_squared_error': 0.0014851231},
'CD115': {'pred': array([0.2469954 , 0.25325543, 0.24312335, ..., 0.2437099 , 0.26318344,
0.2492683 ], dtype=float32),
'true': array([0.24417807, 0.25671375, 0.2558331 , ..., 0.23946778, 0.26340008,
0.25041574], dtype=float32),
'r2_score': 0.2159727711077275,
'mean_squared_error': 0.00015180971},
'Isotype_rIgG2b': {'pred': array([0.3152353 , 0.30198446, 0.24886578, ..., 0.24904986, 0.24726558,
0.24757081], dtype=float32),
'true': array([0.31431442, 0.34223923, 0.32557005, ..., 0.23365542, 0.26259127,
0.24768159], dtype=float32),
'r2_score': 0.2243016223785964,
'mean_squared_error': 0.00011540718},
'Isotype_mIgG1': {'pred': array([0.28719217, 0.2716357 , 0.26335174, ..., 0.27846232, 0.2770714 ,
0.2506193 ], dtype=float32),
'true': array([0.29004258, 0.31199 , 0.3189279 , ..., 0.30371755, 0.311677 ,
0.29318032], dtype=float32),
'r2_score': 0.2049955296927476,
'mean_squared_error': 0.00071228656},
'Isotype_AHIgG': {'pred': array([0.24830323, 0.2492735 , 0.25518328, ..., 0.24604471, 0.24749778,
0.2534635 ], dtype=float32),
'true': array([0.29159304, 0.26729953, 0.27030438, ..., 0.22781612, 0.24713887,
0.24599802], dtype=float32),
'r2_score': 0.21989004514011257,
'mean_squared_error': 0.00014965146},
'Isotype_rIgG2a': {'pred': array([0.25454333, 0.24098672, 0.2699412 , ..., 0.25125703, 0.24956013,
0.24680562], dtype=float32),
'true': array([0.2568882 , 0.281961 , 0.33056295, ..., 0.25644702, 0.24867922,
0.24952465], dtype=float32),
'r2_score': 0.17117436281629972,
'mean_squared_error': 0.00016250834},
'Isotype_rIgG1': {'pred': array([0.2509288 , 0.24883913, 0.25389326, ..., 0.24579412, 0.24850413,
0.24942122], dtype=float32),
'true': array([0.24504311, 0.24437752, 0.25067183, ..., 0.22886375, 0.24671295,
0.23623529], dtype=float32),
'r2_score': 0.21265289462313386,
'mean_squared_error': 0.00014281561}}
Step 6: Predict InfinityMarker Values for Final InfinityFlow Object¶
The InfinityFlow_Utilities.make_flow_regression_predictions function is used to carry out the imputation on the reference FCS dataset to predict the InfinityMarker expression values. This function returns a tuple with the resulting object as an anndata.AnnData object, and a dictionary to store the timing of the prediction steps:
[10]:
sub_p_adata, timings_3 = InfinityFlow_Utilities.make_flow_regression_predictions(\
file_handler=file_handler,
regression_models=regression_models,
use_logicle_scaling=True,
normalization_method=None,
verbosity=3)
Reading in data from .fcs files for pooling into final InfinityFlow object...
DEBUG: Reading in the data for InfinityMarker 33D1...
DEBUG: Reading in the data for InfinityMarker Allergin-1...
DEBUG: Reading in the data for InfinityMarker B7-H4...
DEBUG: Reading in the data for InfinityMarker CD1d...
DEBUG: Reading in the data for InfinityMarker CD103...
DEBUG: Reading in the data for InfinityMarker CD105...
DEBUG: Reading in the data for InfinityMarker CD106...
DEBUG: Reading in the data for InfinityMarker CD107a (Lamp-1)...
DEBUG: Reading in the data for InfinityMarker CD107b (Mac-3)...
DEBUG: Reading in the data for InfinityMarker CD115...
DEBUG: Reading in the data for InfinityMarker Isotype_rIgG2b...
DEBUG: Reading in the data for InfinityMarker Isotype_mIgG1...
DEBUG: Reading in the data for InfinityMarker Isotype_AHIgG...
DEBUG: Reading in the data for InfinityMarker Isotype_rIgG2a...
DEBUG: Reading in the data for InfinityMarker Isotype_rIgG1...
Applying Logicle normalization to data...
Making predictions for final InfinityFlow object...
Working on 33D1...
Working on Allergin-1...
Working on B7-H4...
Working on CD1d...
Working on CD103...
Working on CD105...
Working on CD106...
Working on CD107a (Lamp-1)...
Working on CD107b (Mac-3)...
Working on CD115...
Working on Isotype_rIgG2b...
Working on Isotype_mIgG1...
Working on Isotype_AHIgG...
Working on Isotype_rIgG2a...
Working on Isotype_rIgG1...
/media/kyle_storage/kyle_ferchen/Python/Env/pyInfinityFlow_dev/lib/python3.8/site-packages/pyInfinityFlow/InfinityFlow_Utilities.py:1469: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
predicted_var.loc[:,"IMPUTED"] = True
/media/kyle_storage/kyle_ferchen/Python/Env/pyInfinityFlow_dev/lib/python3.8/site-packages/pyInfinityFlow/InfinityFlow_Utilities.py:1482: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
var = pd.concat([raw_sub_p_adata.var, predicted_var]),
The resulting AnnData object (sub_p_adata) can now be used for downstream analysis steps!
Step 7: Isotype Background Correction¶
We can now carry out Isotype background correction using the InfinityFlow_Utilities.perform_background_correction function, which will return a tuple with 3 values:
A pandas.DataFrame of the background corrected data
The .var annotation to specify settings for the features
Timings dictionary to track how much time was used in the function
[11]:
background_corrected_data, background_corrected_var, timings_4 = \
InfinityFlow_Utilities.perform_background_correction(\
sub_p_adata = sub_p_adata,
infinity_marker_annotation = infinitymarker_anno,
file_handler = file_handler,
cores_to_use = 12,
verbosity = 3)
DEBUG: Feature 33D1 will use isotype Isotype_rIgG2b...
DEBUG: Feature Allergin-1 will use isotype Isotype_mIgG1...
DEBUG: Feature B7-H4 will use isotype Isotype_AHIgG...
DEBUG: Feature CD1d will use isotype Isotype_rIgG2b...
DEBUG: Feature CD103 will use isotype Isotype_AHIgG...
DEBUG: Feature CD105 will use isotype Isotype_rIgG2a...
DEBUG: Feature CD106 will use isotype Isotype_rIgG2a...
DEBUG: Feature CD107a (Lamp-1) will use isotype Isotype_rIgG2a...
DEBUG: Feature CD107b (Mac-3) will use isotype Isotype_rIgG1...
DEBUG: Feature CD115 will use isotype Isotype_rIgG2a...
[12]:
background_corrected_data.head()
[12]:
| 33D1 | Allergin-1 | B7-H4 | CD1d | CD103 | CD105 | CD106 | CD107a (Lamp-1) | CD107b (Mac-3) | CD115 | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.053235 | 0.083288 | 0.062567 | 0.073332 | 0.050480 | 0.143983 | 0.087025 | 0.060597 | 0.050773 | 0.050279 |
| 1 | 0.059184 | 0.080860 | 0.043288 | 0.046708 | 0.030332 | 0.049352 | 0.079781 | 0.069040 | 0.047640 | 0.042443 |
| 2 | 0.059397 | 0.085905 | 0.048852 | 0.151401 | 0.021355 | 0.056680 | 0.089235 | 0.135968 | 0.109725 | 0.052312 |
| 3 | 0.050306 | 0.078846 | 0.060089 | 0.123346 | 0.041751 | 0.036221 | 0.080687 | 0.058094 | 0.036672 | 0.044819 |
| 4 | 0.060201 | 0.072160 | 0.055046 | 0.054883 | 0.049996 | 0.056747 | 0.083932 | 0.073318 | 0.054824 | 0.051910 |
[13]:
background_corrected_var.head()
[13]:
| name | USE_LOGICLE | LOGICLE_T | LOGICLE_W | LOGICLE_M | LOGICLE_A | LOGICLE_APPLIED | IMPUTED | |
|---|---|---|---|---|---|---|---|---|
| 33D1 | InfinityMarker_33D1 | True | 3000000.0 | 0.0 | 3.0 | 1.0 | True | True |
| Allergin-1 | InfinityMarker_Allergin-1 | True | 3000000.0 | 0.0 | 3.0 | 1.0 | True | True |
| B7-H4 | InfinityMarker_B7-H4 | True | 3000000.0 | 0.0 | 3.0 | 1.0 | True | True |
| CD1d | InfinityMarker_CD1d | True | 3000000.0 | 0.0 | 3.0 | 1.0 | True | True |
| CD103 | InfinityMarker_CD103 | True | 3000000.0 | 0.0 | 3.0 | 1.0 | True | True |
Step 8: Silencing Features¶
There are some channels that we may want to avoid considering for downstream analyses if they are not relevant to cell state (Eg. The ‘Time’ parameter). The InfinityFlow_Utilities.move_features_to_silent function will take the given features_to_silence out of the AnnData.X array, and move them into the AnnData.obsm[‘silent’] attribute.
For example, we can first list the features present in the InfinityFlow AnnData object:
[14]:
sub_p_adata.var.index.values
[14]:
array(['FSC-A', 'FSC-H', 'FSC-W', 'SSC-A', 'SSC-H', 'SSC-W',
'FJComp-APC-A', 'FJComp-APC-eFlour780-A', 'FJComp-AlexaFluor700-A',
'FJComp-BUV395-A', 'FJComp-BUV737-A', 'FJComp-BV421-A',
'FJComp-BV510-A', 'FJComp-BV605-A', 'FJComp-BV650-A',
'FJComp-BV711-A', 'FJComp-BV786-A', 'FJComp-GFP-A',
'FJComp-PE(yg)-A', 'FJComp-PE-Cy7(yg)-A', 'FJComp-PerCP-Cy5-5-A',
'Time', '33D1', 'Allergin-1', 'B7-H4', 'CD1d', 'CD103', 'CD105',
'CD106', 'CD107a (Lamp-1)', 'CD107b (Mac-3)', 'CD115',
'Isotype_rIgG2b', 'Isotype_mIgG1', 'Isotype_AHIgG',
'Isotype_rIgG2a', 'Isotype_rIgG1'], dtype=object)
Let’s move some of the features to silent, so they are not considered for dimensionality reduction or clustering:
[15]:
features_to_silence = ['FSC-A', 'FSC-H', 'FSC-W', 'SSC-A', 'SSC-H', 'SSC-W',
'FJComp-PE(yg)-A', 'Isotype_rIgG2b', 'Isotype_mIgG1', 'Isotype_AHIgG',
'Isotype_rIgG2a', 'Isotype_rIgG1', 'Time']
sub_p_adata = InfinityFlow_Utilities.move_features_to_silent(sub_p_adata, features_to_silence)
sub_p_adata
[15]:
AnnData object with n_obs × n_vars = 15000 × 24
obs: 'cell_number', 'batch'
var: 'name', 'USE_LOGICLE', 'LOGICLE_T', 'LOGICLE_W', 'LOGICLE_M', 'LOGICLE_A', 'LOGICLE_APPLIED', 'IMPUTED'
uns: 'obs_file_origin', 'silent_var'
obsm: 'silent'
As you can see, the AnnData object now contains an obsm key ‘silent’ to store the event values for the silenced features, as well as a ‘silent_var’ pandas.DataFrame in the AnnData.uns attribute.
[16]:
sub_p_adata.obsm['silent'].head()
[16]:
| FSC-A | FSC-H | FSC-W | SSC-A | SSC-H | SSC-W | FJComp-PE(yg)-A | Isotype_rIgG2b | Isotype_mIgG1 | Isotype_AHIgG | Isotype_rIgG2a | Isotype_rIgG1 | Time | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| F0:5 | 44290.410156 | 48154.0 | 60277.785156 | 3829.130127 | 3626.0 | 0.590961 | 0.258807 | 0.252803 | 0.247846 | 0.250860 | 0.246098 | 0.256654 | 0.250912 |
| F0:113 | 33078.601562 | 28222.0 | 76813.804688 | 9273.810547 | 7514.0 | 0.607835 | 0.245435 | 0.235438 | 0.238048 | 0.257867 | 0.249520 | 0.250518 | 0.251031 |
| F0:137 | 141369.265625 | 105760.0 | 87601.898438 | 26223.259766 | 22176.0 | 0.603203 | 0.195596 | 0.230424 | 0.277973 | 0.244754 | 0.227513 | 0.227703 | 0.251047 |
| F0:375 | 86083.023438 | 62675.0 | 90012.554688 | 9570.060547 | 7683.0 | 0.608832 | 0.238332 | 0.248544 | 0.235057 | 0.245086 | 0.249628 | 0.247355 | 0.251229 |
| F0:430 | 126470.789062 | 99065.0 | 83666.179688 | 7711.190430 | 6665.0 | 0.600839 | 0.213695 | 0.249908 | 0.270671 | 0.260307 | 0.245320 | 0.248516 | 0.251268 |
[17]:
sub_p_adata.uns['silent_var'].head()
[17]:
| name | USE_LOGICLE | LOGICLE_T | LOGICLE_W | LOGICLE_M | LOGICLE_A | LOGICLE_APPLIED | IMPUTED | |
|---|---|---|---|---|---|---|---|---|
| FSC-A | False | 3000000.0 | 0.0 | 3.0 | 1.0 | False | False | |
| FSC-H | False | 3000000.0 | 0.0 | 3.0 | 1.0 | False | False | |
| FSC-W | False | 3000000.0 | 0.0 | 3.0 | 1.0 | False | False | |
| SSC-A | False | 3000000.0 | 0.0 | 3.0 | 1.0 | False | False | |
| SSC-H | False | 3000000.0 | 0.0 | 3.0 | 1.0 | False | False |
Step 9: Dimensionality Reduction¶
Now that the InfinityFlow results are in an AnnData object, we can use the tools provided by Scanpy to perform downstream analysis.
PCA¶
If there are a lot of features in the dataset, it may be beneficial to use Principal component analysis to reduce the feature space to a smaller set that captures most of the variation observed.
We can apply the scanpy.tl.pca function to carry this out on our InfinityFlow AnnData object. InfinityFlow_Utilities.make_pca_elbo_plot can then be used to generate an elbo plot so we can estimate how few features we can get away with using that capture most of the variation in the dataset:
[18]:
import scanpy as sc
sc.tl.pca(sub_p_adata)
# It is useful to save the features that were used
# at the time the PCA function was called, as the
# silenced features may change when the object is
# reloaded.
sub_p_adata.uns['pca_features'] = sub_p_adata.var.index.values
# Make the elbo plot:
InfinityFlow_Utilities.make_pca_elbo_plot(\
sub_p_adata=sub_p_adata,
output_paths=output_paths)
Here, we can see that with the first 15 Principal components, we capture most of the explained variance in the data. So for downstream analysis steps, we will select to use 15 PCs.
Note that the make_pca_elbo_plot function will save this PC Elbo Plot to the ‘QC’ directory of the output_dir that we specified in the InfinityFlow_Utilities.setup_output_directories function. This produced the output_paths dictionary. We can check to see where the ‘QC’ folder in the output_paths on our machine is with the following:
[19]:
# List the keys available in output_paths
output_paths.keys()
[19]:
dict_keys(['output_regression_path', 'output_umap_feature_plot_path', 'clustering', 'qc', 'output_umap_bc_feature_plot_path'])
[20]:
# Print out the 'qc' directory path
output_paths['qc']
[20]:
'/media/kyle_ssd1/outputs/QC'
UMAP¶
UMAP is a very popular method for dimensionality reduction, particularly for the practice of reducing the feature space to 2-Dimensions to view that data as a scatterplot. With this, we can observe the global structure of the data to get an idea of what groups of observations exist. In the context of Flow Cytometry, we can also cluster the data to identify cell types based on surface marker phenotypes.
To carry out UMAP 2D-Dimensionality reduction, we can again use scanpy. First, we need to generate aa estimate of the adjacency matrix, using the scanpy.pp.neighbors function, which will help the UMAP function optimize where to put the observations in our dataset on the reduced dimension space. We will specify the function to use the first 15 PCs:
[21]:
sc.pp.neighbors(sub_p_adata, n_pcs=15)
sub_p_adata
[21]:
AnnData object with n_obs × n_vars = 15000 × 24
obs: 'cell_number', 'batch'
var: 'name', 'USE_LOGICLE', 'LOGICLE_T', 'LOGICLE_W', 'LOGICLE_M', 'LOGICLE_A', 'LOGICLE_APPLIED', 'IMPUTED'
uns: 'obs_file_origin', 'silent_var', 'pca', 'pca_features', 'neighbors'
obsm: 'silent', 'X_pca'
varm: 'PCs'
obsp: 'distances', 'connectivities'
As you can see, this added the ‘neighbors’ key to the sub_p_adata.uns attribute, as well as the ‘distances’ and ‘connectivities’ to the sub_p_adata.obsp attribute.
We can then call the scanpy.tl.umap function to generate the low dimensional embedding:
[22]:
sc.tl.umap(sub_p_adata)
sub_p_adata
[22]:
AnnData object with n_obs × n_vars = 15000 × 24
obs: 'cell_number', 'batch'
var: 'name', 'USE_LOGICLE', 'LOGICLE_T', 'LOGICLE_W', 'LOGICLE_M', 'LOGICLE_A', 'LOGICLE_APPLIED', 'IMPUTED'
uns: 'obs_file_origin', 'silent_var', 'pca', 'pca_features', 'neighbors', 'umap'
obsm: 'silent', 'X_pca', 'X_umap'
varm: 'PCs'
obsp: 'distances', 'connectivities'
We will then move the 2 UMAP vectors to the sub_p_adata.obs DataFrame:
[23]:
sub_p_adata.obs["umap-x"] = sub_p_adata.obsm['X_umap'][:,0]
sub_p_adata.obs["umap-y"] = sub_p_adata.obsm['X_umap'][:,1]
sub_p_adata.obs.head()
[23]:
| cell_number | batch | umap-x | umap-y | |
|---|---|---|---|---|
| F0:5 | 5 | 33D1 | -3.683015 | 13.849557 |
| F0:113 | 113 | 33D1 | 13.346261 | 0.064281 |
| F0:137 | 137 | 33D1 | 7.532703 | -3.680286 |
| F0:375 | 375 | 33D1 | 9.984137 | 7.248507 |
| F0:430 | 430 | 33D1 | 12.682547 | 0.622352 |
Step 10: Making Feature Plots¶
Next, we will make feature plots of each feature currently stored in the sub_p_adata.var space (not the silenced features). The InfinityFlow_Utilities.save_umap_figures_all_features function is called on the InfinityFlow AnnData object. Note, we can include the background_corrected_data to also plot the background corrected features.
This function will save the original prediction feature figures to the ‘output_umap_feature_plot_path’ and the background corrected feature figures to the ‘output_umap_bc_feature_plot_path’ in the output_paths dictionary:
[24]:
timings_6 = InfinityFlow_Utilities.save_umap_figures_all_features(\
sub_p_adata,
background_corrected_data = background_corrected_data,
file_handler = file_handler,
output_paths = output_paths,
verbosity=3)
Working on plotting feature 33D1...
Working on plotting feature Allergin-1...
Working on plotting feature B7-H4...
Working on plotting feature CD103...
Working on plotting feature CD105...
Working on plotting feature CD106...
Working on plotting feature CD107a (Lamp-1)...
Working on plotting feature CD107b (Mac-3)...
Working on plotting feature CD115...
Working on plotting feature CD1d...
Working on plotting feature FJComp-APC-A...
Working on plotting feature FJComp-APC-eFlour780-A...
Working on plotting feature FJComp-AlexaFluor700-A...
Working on plotting feature FJComp-BUV395-A...
Working on plotting feature FJComp-BUV737-A...
Working on plotting feature FJComp-BV421-A...
Working on plotting feature FJComp-BV510-A...
Working on plotting feature FJComp-BV605-A...
Working on plotting feature FJComp-BV650-A...
Working on plotting feature FJComp-BV711-A...
Working on plotting feature FJComp-BV786-A...
Working on plotting feature FJComp-GFP-A...
Working on plotting feature FJComp-PE-Cy7(yg)-A...
Working on plotting feature FJComp-PerCP-Cy5-5-A...
Step 11: Clustering the Data¶
Next, we can try to cluster the events from our FCS data. The Leiden algorithm is a popular method for clustering data, and is provided in the scanpy.tl.leiden function. It will utilize the estimated adjacency matrix produced by scanpy.pp.neighbors.
[25]:
sc.tl.leiden(sub_p_adata)
sub_p_adata
[25]:
AnnData object with n_obs × n_vars = 15000 × 24
obs: 'cell_number', 'batch', 'umap-x', 'umap-y', 'leiden'
var: 'name', 'USE_LOGICLE', 'LOGICLE_T', 'LOGICLE_W', 'LOGICLE_M', 'LOGICLE_A', 'LOGICLE_APPLIED', 'IMPUTED'
uns: 'obs_file_origin', 'silent_var', 'pca', 'pca_features', 'neighbors', 'umap', 'leiden'
obsm: 'silent', 'X_pca', 'X_umap'
varm: 'PCs'
obsp: 'distances', 'connectivities'
You can see that this function added the ‘leiden’ feature to our sub_p_adata.obs DataFrame, as well as the ‘leiden’ key to the sub_p_adata.uns attribute to store the parameters provided to the Leiden clustering algorithm.
Specify Colors for Clusters¶
Let’s specify a set of colors to use for later plotting the clusters. The Plotting_Utilities.assign_rainbow_colors_to_groups function provides a quick way to assign colors to a set of cluster assignments:
[26]:
from pyInfinityFlow.Plotting_Utilities import assign_rainbow_colors_to_groups
groups_to_colors = assign_rainbow_colors_to_groups(\
sub_p_adata.obs["leiden"].values)
sub_p_adata.uns['groups_to_color'] = groups_to_colors
sub_p_adata.uns['groups_to_color']
[26]:
{'0': '#8000ff',
'1': '#6c1fff',
'10': '#5641fd',
'11': '#4062fa',
'12': '#2c7ef7',
'13': '#169bf2',
'14': '#00b5eb',
'15': '#14cae5',
'16': '#2adddd',
'17': '#40ecd4',
'18': '#54f6cb',
'19': '#6afdc0',
'2': '#80ffb4',
'20': '#94fda8',
'21': '#abf69b',
'22': '#c0eb8d',
'23': '#d4dd80',
'3': '#ebca70',
'4': '#ffb360',
'5': '#ff9b52',
'6': '#ff7e41',
'7': '#ff5f30',
'8': '#ff4121',
'9': '#ff1f10'}
This simply set a hexadecimal color value to each of the ‘leiden’ clusters and stored the mapping as a dictionary to later use for plotting the clusters.
Plotting Leiden clusters over UMAP¶
Now that we have colors associated with our Leiden clusters in the sub_p_adata.uns['groups_to_color'] attribute, we can project those clusters onto the 2D-UMAP to get an idea of where each cluster sits in the reduced dimensional space.
The plot_leiden_clusters_over_umap function will take in the InfinityFlow AnnData object and the output_paths dictionary and save this UMAP to the ‘clustering’ directory in the output_dir:
[27]:
from pyInfinityFlow.Plotting_Utilities import plot_leiden_clusters_over_umap
plot_leiden_clusters_over_umap(\
sub_p_adata=sub_p_adata,
output_paths=output_paths,
verbosity=3)
[28]:
# Clustering directory
print(output_paths['clustering'])
/media/kyle_ssd1/outputs/clustering
Step 12: Find Markers for Clusters¶
We can use the MarkerFinder algorithm to assign each feature to a cluster which it best uniquely identifies. This is provided as the InfinityFlow_Utilities.find_markers_from_anndata function, and works directly on the InfinityFlow formatted AnnData object.
[29]:
markers_df, cell_assignments = InfinityFlow_Utilities.find_markers_from_anndata(\
sub_p_adata=sub_p_adata,
output_paths=output_paths,
groups_to_colors=sub_p_adata.uns['groups_to_color'],
verbosity=3)
Finding markers for Infinity Flow object...
Plotting markers...
Note, this will save a heatmap of the markers vs. clusters in the output_paths['clustering'] directory, as well as a csv file with the MarkerFinder results:
[30]:
# Clustering outputs directory
print(output_paths['clustering'])
# Contents of the directory
os.listdir(output_paths['clustering'])
/media/kyle_ssd1/outputs/clustering
[30]:
['cluster_markers.csv', 'cluster_markers.pdf', 'Leiden_Clusters_over_UMAP.png']
We can also now plot the Leiden clusters over the UMAP plot to get an idea of where each cluster sits in the reduced dimensional space:
Step 13: Moving Features out of Silent¶
After we have performed dimensionality reduction and clustering with our features of interest, we may want to move the features that we previously silenced back into the AnnData object. This will make it so when we save our final FCS file, we can include features we may have silenced (Eg. ‘Time’ or ‘FSC-A’) that we want to add back.
We can list what features are currently silenced:
[31]:
sub_p_adata.uns['silent_var'].index.values
[31]:
array(['FSC-A', 'FSC-H', 'FSC-W', 'SSC-A', 'SSC-H', 'SSC-W',
'FJComp-PE(yg)-A', 'Isotype_rIgG2b', 'Isotype_mIgG1',
'Isotype_AHIgG', 'Isotype_rIgG2a', 'Isotype_rIgG1', 'Time'],
dtype=object)
Let’s move the ‘FSC-A’, ‘FSC-H’, ‘FSC-W’, ‘SSC-A’, ‘SSC-H’, and ‘SSC-W’ features out if of the silenced space. The InfinityFlow_Utilities.move_features_out_of_silent function will take in our AnnData object along with a list of features to move out of silent:
[32]:
features_to_unsilence = ['FSC-A', 'FSC-H', 'FSC-W', 'SSC-A', 'SSC-H', 'SSC-W']
sub_p_adata = InfinityFlow_Utilities.move_features_out_of_silent(\
sub_p_adata,
features_to_unsilence)
Now these features are back in the AnnData.X and AnnData.var attributes!
[33]:
sub_p_adata.var
[33]:
| name | USE_LOGICLE | LOGICLE_T | LOGICLE_W | LOGICLE_M | LOGICLE_A | LOGICLE_APPLIED | IMPUTED | |
|---|---|---|---|---|---|---|---|---|
| 33D1 | InfinityMarker_33D1 | True | 3000000.0 | 0.0 | 3.0 | 1.0 | True | True |
| Allergin-1 | InfinityMarker_Allergin-1 | True | 3000000.0 | 0.0 | 3.0 | 1.0 | True | True |
| B7-H4 | InfinityMarker_B7-H4 | True | 3000000.0 | 0.0 | 3.0 | 1.0 | True | True |
| CD103 | InfinityMarker_CD103 | True | 3000000.0 | 0.0 | 3.0 | 1.0 | True | True |
| CD105 | InfinityMarker_CD105 | True | 3000000.0 | 0.0 | 3.0 | 1.0 | True | True |
| CD106 | InfinityMarker_CD106 | True | 3000000.0 | 0.0 | 3.0 | 1.0 | True | True |
| CD107a (Lamp-1) | InfinityMarker_CD107a (Lamp-1) | True | 3000000.0 | 0.0 | 3.0 | 1.0 | True | True |
| CD107b (Mac-3) | InfinityMarker_CD107b (Mac-3) | True | 3000000.0 | 0.0 | 3.0 | 1.0 | True | True |
| CD115 | InfinityMarker_CD115 | True | 3000000.0 | 0.0 | 3.0 | 1.0 | True | True |
| CD1d | InfinityMarker_CD1d | True | 3000000.0 | 0.0 | 3.0 | 1.0 | True | True |
| FJComp-APC-A | CD69-CD301b | True | 3000000.0 | 0.0 | 3.0 | 1.0 | True | False |
| FJComp-APC-eFlour780-A | Zombie | True | 3000000.0 | 0.0 | 3.0 | 1.0 | True | False |
| FJComp-AlexaFluor700-A | MHCII | True | 3000000.0 | 0.0 | 3.0 | 1.0 | True | False |
| FJComp-BUV395-A | CD4 | True | 3000000.0 | 0.0 | 3.0 | 1.0 | True | False |
| FJComp-BUV737-A | CD44 | True | 3000000.0 | 0.0 | 3.0 | 1.0 | True | False |
| FJComp-BV421-A | CD8 | True | 3000000.0 | 0.0 | 3.0 | 1.0 | True | False |
| FJComp-BV510-A | CD11c | True | 3000000.0 | 0.0 | 3.0 | 1.0 | True | False |
| FJComp-BV605-A | CD11b | True | 3000000.0 | 0.0 | 3.0 | 1.0 | True | False |
| FJComp-BV650-A | F480 | True | 3000000.0 | 0.0 | 3.0 | 1.0 | True | False |
| FJComp-BV711-A | Ly6C | True | 3000000.0 | 0.0 | 3.0 | 1.0 | True | False |
| FJComp-BV786-A | Lineage | True | 3000000.0 | 0.0 | 3.0 | 1.0 | True | False |
| FJComp-GFP-A | CD45a488 | True | 3000000.0 | 0.0 | 3.0 | 1.0 | True | False |
| FJComp-PE-Cy7(yg)-A | CD24 | True | 3000000.0 | 0.0 | 3.0 | 1.0 | True | False |
| FJComp-PerCP-Cy5-5-A | CD103 | True | 3000000.0 | 0.0 | 3.0 | 1.0 | True | False |
| FSC-A | False | 3000000.0 | 0.0 | 3.0 | 1.0 | False | False | |
| FSC-H | False | 3000000.0 | 0.0 | 3.0 | 1.0 | False | False | |
| FSC-W | False | 3000000.0 | 0.0 | 3.0 | 1.0 | False | False | |
| SSC-A | False | 3000000.0 | 0.0 | 3.0 | 1.0 | False | False | |
| SSC-H | False | 3000000.0 | 0.0 | 3.0 | 1.0 | False | False | |
| SSC-W | True | 3000000.0 | 0.0 | 3.0 | 1.0 | True | False |
Step 14: Saving Regression Outputs¶
Now that we have our final InfinityFlow object stored in AnnData format, we can save it to storage in different formats for later downstream analyses.
h5ad File¶
The h5ad file will preserve the structure of the AnnData object, and let’s us quickly load the data for future processing with tools like Scanpy. We can simply use the .write method on the AnnData object to write the file as an h5ad file.
[34]:
h5_path = os.path.join(output_paths['output_regression_path'],
"InfinityFlow_object_logicle_normalized.h5ad")
sub_p_adata.write(h5_path)
The output_paths['output_regression_path'] can provide the traditional path set up with the output_paths directory to save the file.
[35]:
# Eg. Using the output_paths directory
output_paths['output_regression_path']
[35]:
'/media/kyle_ssd1/outputs/regression_results'
Feather File¶
The Feather file format is commonly used for DataFrame objects. We will lose some of the information present in the AnnData object, but we will be able to very quickly load the DataFrame back into memory:
The InfinityFlow_Utilities.anndata_to_df function provides a quick way to convert the AnnData object to a DataFrame. We can then reset the index and save the DataFrame as a Feather file with the .to_feather method provided by Pandas:
[36]:
# Create an output path for the DataFrame
feather_path = os.path.join(output_paths['output_regression_path'],
"InfinityFlow_object_logicle_normalized.fea")
# Convert to DataFrame
df = InfinityFlow_Utilities.anndata_to_df(\
input_anndata=sub_p_adata,
use_raw_feature_names=False,
add_index_names=True)
# Save as Feather file
df.reset_index().to_feather(feather_path)
df.head()
[36]:
| 33D1:InfinityMarker_33D1 | Allergin-1:InfinityMarker_Allergin-1 | B7-H4:InfinityMarker_B7-H4 | CD103:InfinityMarker_CD103 | CD105:InfinityMarker_CD105 | CD106:InfinityMarker_CD106 | CD107a (Lamp-1):InfinityMarker_CD107a (Lamp-1) | CD107b (Mac-3):InfinityMarker_CD107b (Mac-3) | CD115:InfinityMarker_CD115 | CD1d:InfinityMarker_CD1d | ... | FJComp-BV786-A:Lineage | FJComp-GFP-A:CD45a488 | FJComp-PE-Cy7(yg)-A:CD24 | FJComp-PerCP-Cy5-5-A:CD103 | FSC-A: | FSC-H: | FSC-W: | SSC-A: | SSC-H: | SSC-W: | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| F0:5 | 0.252006 | 0.221211 | 0.254082 | 0.249303 | 0.389210 | 0.249804 | 0.253386 | 0.300907 | 0.253446 | 0.307087 | ... | 0.235586 | 0.223836 | 0.272052 | 0.276451 | 44290.410156 | 48154.0 | 60277.785156 | 3829.130127 | 3626.0 | 0.590961 |
| F0:113 | 0.245084 | 0.211464 | 0.234267 | 0.227056 | 0.261304 | 0.243082 | 0.286747 | 0.282273 | 0.246122 | 0.242309 | ... | 0.402271 | 0.503064 | 0.489275 | 0.321054 | 33078.601562 | 28222.0 | 76813.804688 | 9273.810547 | 7514.0 | 0.607835 |
| F0:137 | 0.241113 | 0.245277 | 0.229664 | 0.220823 | 0.250333 | 0.235402 | 0.419354 | 0.373128 | 0.240887 | 0.404543 | ... | 0.416697 | 0.678155 | 0.305735 | 0.509343 | 141369.265625 | 105760.0 | 87601.898438 | 26223.259766 | 22176.0 | 0.603203 |
| F0:375 | 0.244557 | 0.206941 | 0.245322 | 0.241661 | 0.243205 | 0.244426 | 0.255690 | 0.252353 | 0.249280 | 0.382239 | ... | 0.296814 | 0.598728 | 0.494562 | 0.217368 | 86083.023438 | 62675.0 | 90012.554688 | 9570.060547 | 7683.0 | 0.608832 |
| F0:430 | 0.258684 | 0.223491 | 0.252599 | 0.246665 | 0.267524 | 0.244834 | 0.287726 | 0.294017 | 0.254916 | 0.273731 | ... | 0.373348 | 0.558255 | 0.484305 | 0.337422 | 126470.789062 | 99065.0 | 83666.179688 | 7711.190430 | 6665.0 | 0.600839 |
5 rows × 30 columns
FCS File¶
After converting the InfinityFlow object back into an FCS file, we can then open the file in traditional Flow Cytometry Analysis software tools (Eg. Flowjo) to perform different custom downstream analyses, like gating to certain populations.
Inverting the Logicle Normalization¶
However, since we used Logicle normalization to more accurately carry out regression and perform dimensionality reduction and clustering, we should invert the Logicle normalization back to the original fluorescence intensity measurements. The pyInfinityFlow format of the AnnData object stores the method for carrying out the Logicle normalization and inverting it in the .var attribute:
[37]:
sub_p_adata.var.head()
[37]:
| name | USE_LOGICLE | LOGICLE_T | LOGICLE_W | LOGICLE_M | LOGICLE_A | LOGICLE_APPLIED | IMPUTED | |
|---|---|---|---|---|---|---|---|---|
| 33D1 | InfinityMarker_33D1 | True | 3000000.0 | 0.0 | 3.0 | 1.0 | True | True |
| Allergin-1 | InfinityMarker_Allergin-1 | True | 3000000.0 | 0.0 | 3.0 | 1.0 | True | True |
| B7-H4 | InfinityMarker_B7-H4 | True | 3000000.0 | 0.0 | 3.0 | 1.0 | True | True |
| CD103 | InfinityMarker_CD103 | True | 3000000.0 | 0.0 | 3.0 | 1.0 | True | True |
| CD105 | InfinityMarker_CD105 | True | 3000000.0 | 0.0 | 3.0 | 1.0 | True | True |
We can invert the Logicle normalization on the features that have the USE_LOGICLE column set to True by using the InfinityFlow_Utilities.apply_inverse_logicle_to_anndata function:
[38]:
InfinityFlow_Utilities.apply_inverse_logicle_to_anndata(sub_p_adata)
InfinityFlow_Utilities.anndata_to_df(sub_p_adata).head()
[38]:
| 33D1 | Allergin-1 | B7-H4 | CD103 | CD105 | CD106 | CD107a (Lamp-1) | CD107b (Mac-3) | CD115 | CD1d | ... | FJComp-BV786-A | FJComp-GFP-A | FJComp-PE-Cy7(yg)-A | FJComp-PerCP-Cy5-5-A | FSC-A | FSC-H | FSC-W | SSC-A | SSC-H | SSC-W | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| F0:5 | 110.998680 | -1611.187378 | 225.875824 | -38.550339 | 9981.195312 | -10.849790 | 187.352631 | 2918.396240 | 190.660049 | 3302.118652 | ... | -799.727173 | -1461.367188 | 1228.294067 | 1477.697266 | 44290.410156 | 48154.0 | 60277.785156 | 3829.130127 | 3626.0 | 69207.367188 |
| F0:113 | -272.058624 | -2176.202148 | -873.436096 | -1278.713989 | 626.500122 | -382.992645 | 2071.328857 | 1811.447266 | -214.581619 | -425.844513 | ... | 11457.474609 | 30567.076172 | 26846.810547 | 4209.315430 | 33078.601562 | 28222.0 | 76813.804688 | 9273.810547 | 7514.0 | 80884.835938 |
| F0:137 | -492.217468 | -261.397980 | -1131.551514 | -1633.387573 | 18.449854 | -810.031494 | 13642.258789 | 8355.895508 | -504.758484 | 11730.511719 | ... | 13281.506836 | 154731.609375 | 3217.331787 | 32420.519531 | 141369.265625 | 105760.0 | 87601.898438 | 26223.259766 | 22176.0 | 77496.765625 |
| F0:375 | -301.269897 | -2443.946289 | -258.863922 | -461.783051 | -376.178040 | -308.487518 | 314.955200 | 130.189346 | -39.852901 | 9252.339844 | ... | 2669.240967 | 74358.414062 | 28219.044922 | -1832.219971 | 86083.023438 | 62675.0 | 90012.554688 | 9570.060547 | 7683.0 | 81632.640625 |
| F0:430 | 480.960388 | -1480.978882 | 143.819992 | -184.526932 | 973.659363 | -285.889282 | 2128.691162 | 2501.141357 | 272.047882 | 1323.218140 | ... | 8376.817383 | 51127.210938 | 25615.402344 | 5362.881836 | 126470.789062 | 99065.0 | 83666.179688 | 7711.190430 | 6665.0 | 75823.031250 |
5 rows × 30 columns
You can see that the fluorescence derived values are now no longer between 0 and 1, indicating that the Logicle normalization has been inverted.
We can then save the data as an FCS file with the InfinityFlow_Utilities.save_fcs_flow_anndata function:
[39]:
InfinityFlow_Utilities.save_fcs_flow_anndata(\
sub_p_adata = sub_p_adata,
background_corrected_data = background_corrected_data,
background_corrected_var = background_corrected_var,
file_handler = file_handler,
output_paths = output_paths,
add_umap = True,
use_logicle = True,
verbosity=3)
Writing out base prediction values to fcs file...
WARNING! No features required inverting logicle normalization at this time.
Omitting spillover matrix...
WARNING! TEXT segment value for key $P25S is empty. Excluding from written file.
WARNING! TEXT segment value for key $P26S is empty. Excluding from written file.
WARNING! TEXT segment value for key $P27S is empty. Excluding from written file.
WARNING! TEXT segment value for key $P28S is empty. Excluding from written file.
WARNING! TEXT segment value for key $P29S is empty. Excluding from written file.
WARNING! TEXT segment value for key $P30S is empty. Excluding from written file.
WARNING! TEXT segment value for key $P31S is empty. Excluding from written file.
WARNING! TEXT segment value for key $P32S is empty. Excluding from written file.
Writing out background-corrected prediction values to fcs file...
Omitting spillover matrix...
WARNING! TEXT segment value for key $P25S is empty. Excluding from written file.
WARNING! TEXT segment value for key $P26S is empty. Excluding from written file.
WARNING! TEXT segment value for key $P27S is empty. Excluding from written file.
WARNING! TEXT segment value for key $P28S is empty. Excluding from written file.
WARNING! TEXT segment value for key $P29S is empty. Excluding from written file.
WARNING! TEXT segment value for key $P30S is empty. Excluding from written file.
WARNING! TEXT segment value for key $P31S is empty. Excluding from written file.
WARNING! TEXT segment value for key $P32S is empty. Excluding from written file.
[39]:
{'file_export': 2.1501059532165527}
Finish¶
We have now carried out all of the steps of the analysis pipeline provided by pyInfinityFlow, using the API!