API¶
The pyInfinityFlow API is designed to give the user more control over what parameters are used in the InfinityFlow pipeline. It also allows for any FCS file to be processed with any step of the pipeline.
InfinityFlow_Utilities¶
InfinityFlow_Utilities: Classes¶
- class pyInfinityFlow.InfinityFlow_Utilities.InfinityFlowFileHandler(ordered_reference_backbone)¶
Class to specify how to handle InfinityMarker files.
- Parameters
ordered_reference_backbone (numpy.Array[str]) – Array of backbone channel names
- list_infinity_markers¶
List of the InfinityMarker names that the object can handle
- Type
list[str]
- handles¶
A dictionary with a key for every InfinityMarker. Each key stores a dictionary with the following values to specify how to handle the given InfinityMarker:
[“name”]: (str) InfinityMarker name
[“file_name”]: (str) .fcs file name for the InfinityMarker
[“directory”]: (str) path to the directory where the .fcs file is saved
[“reference_backbone_channels”]: (list[str]) list of the channel names to use for the backbone in the reference .fcs file (the events used for prediction)
[“backbone_channels”]: (list[str]) list of the channel names to use for the backbone in the reference .fcs file (the events used for XGBoost regression model fitting)
[“prediction_channel”]: (str) channel name of the InfinityMarker, the channel name to predict
[“train_indices”]: (numpy.Array[int]) indices of the InfinityMarker .fcs file to use for fitting
[“test_indices”]: (numpy.Array[int]) indices of the InfinityMarker .fcs file to use for validation
[“pool_indices”]: (numpy.Array[int]) indices of the InfinityMarker .fcs file to use for pooling into the reference to use
- Type
dict{dicts}
- use_isotype_controls¶
If True, pipeline functions will require Isotype controls
- Type
bool
- isotype_control_names¶
Array of InfinityMarker names
- Type
numpy.Array[str]
- ordered_reference_backbone¶
Array of backbone channel names
- Type
numpy.Array[str]
- add_handle(name, file_name, directory, reference_backbone_channels, backbone_channels, prediction_channel, train_indices, test_indices, pool_indices)¶
Add a new InfinityMarker handle to the InfinityFlowFileHandler
- Parameters
name (str) – The name of the InfinityMarker (Required)
file_name (str) – The .fcs file name for the InfinityMarker (Required)
directory (str) – The path to the directory where the .fcs file is saved (Required)
reference_backbone_channels (list[str]) – list of the channel names to use for the backbone in the reference .fcs file (the events used for prediction)
backbone_channels (list[str]) – list of the channel names to use for the backbone in the reference .fcs file (the events used for XGBoost regression model fitting)
prediction_channel (str) – The channel name of the InfinityMarker, the channel name to predict
train_indices (numpy.Array[int]) – The indices of the InfinityMarker .fcs file to use for fitting
test_indices (numpy.Array[int]) – The indices of the InfinityMarker .fcs file to use for validation
pool_indices (numpy.Array[int]) – The indices of the InfinityMarker .fcs file to use for pooling into the reference to use
- Returns
- Adds the given InfinityMarker handle to InfinityFlowFileHandler, where handles is a dictionary, and each entry, named by the InfinityMarker name is a dictionary with the following keys:
[“name”]
[“file_name”]
[“directory”]
[“reference_backbone_channels”]
[“backbone_channels”]
[“prediction_channel”]
[“train_indices”]
[“test_indices”]
[“pool_indices”]
- Return type
None
- class pyInfinityFlow.InfinityFlow_Utilities.CombinedRegressionModels(ordered_training_channels, var_annotations, infinity_markers, regression_models, parameter_annotations, infinity_channels)¶
Class to store XGBoost regression models, the settings used to fit the model, and the validation metrics from testing.
- ordered_training_channels¶
The features used to train each of the regression models
- Type
numpy.Array[str]
- var_annotations¶
The feature parameters for the training features of backbone
- Type
pandas.DataFrame
- infinity_markers¶
The response variables the regression models can predict
- Type
numpy.Array[str]
- regression_models¶
Dictionary of response variable to regression model for prediction
- Type
dict{InfinityMarker: xgboost.XGBRegressor}
- parameter_annotations¶
Dictionary of Series to specify the feature parameter (was logicle applied to the response varialble)
- Type
dict{InfinityMarker: Series}
- infinity_channels¶
The channel name for the InfinityMarker (“Response Variable”)
- Type
dict{InfinityMarker: str}
- validation_metrics¶
Provide validation metrics as an object with each InfinityMarker as a key
- Type
dict{InfinityMarker: dict}
InfinityFlow_Utilities: General Tools¶
- pyInfinityFlow.InfinityFlow_Utilities.read_annotation_table(input_file)¶
Read in an annotation file. Annotation files are used to dictate how to carry out the regression models.
- Parameters
input_file (str) – The path to the file containing the annotation information. (Should be either comma separated (.csv) or tab separated (.tsv or .txt)) (Required)
use_raw_feature_names (bool) – Optional argument. If True, only use the raw feature names from input_anndata.var.index. If False, add the “name” values for the features in input_anndata.var.index, formatted as <index>:<name>. (Default: True)
add_index_names (bool) – Optional argument. If True, will add the input_anndata.obs.index as the index of the returned DataFrame. If False, the index will simply be the integers from range(len(input_anndata.obs.shape[0]))
- Returns
DataFrame of the annotation table
- Return type
pandas.DataFrame
- pyInfinityFlow.InfinityFlow_Utilities.anndata_to_df(input_anndata, use_raw_feature_names=True, add_index_names=True)¶
Function to quickly convert an AnnData object containing pyInfinityFlow formatted flow cytometry data to a pandas DataFrame object
- Parameters
input_anndata (anndata.AnnData) – AnnData object for which to generate a DataFrame (Required)
use_raw_feature_names (bool) – Optional argument. If True, only use the raw feature names from input_anndata.var.index. If False, add the “name” values for the features in input_anndata.var.index, formatted as <index>:<name>. (Default: True)
add_index_names (bool) – Optional argument. If True, will add the input_anndata.obs.index as the index of the returned DataFrame. If False, the index will simply be the integers from range(len(input_anndata.obs.shape[0]))
- Returns
DataFrame of the AnnData object’s X attribute
- Return type
pandas.DataFrame
- pyInfinityFlow.InfinityFlow_Utilities.marker_finder(input_df, groups)¶
Function to find which features in input_df correspond best to which groups annotating the observations in input_df. The function will perform a Pearson correlation of the input_df feature values to an “idealized” group specific expression vector, where each observation in a given group is set to a value of 1, and the observations in other groups are set to 0.
- Parameters
input_df (pandas.DataFrame) – DataFrame with observations as index and features as columns (Required)
groups (list[str]) – List-like of specified groups corresponding to observations from the input_df. The order of groups should match the order in input_df.index (Required)
- Returns
DataFrame of the Pearson correlation test results. Each feature is assigned the cluster for which the test resulted in the highest Pearson correlation coefficient. The columns of the DataFrame will be [“marker”, “top_cluster”, “pearson_r”, “p_value”]
- Return type
pandas.DataFrame
- pyInfinityFlow.InfinityFlow_Utilities.read_fcs_into_anndata(fcs_file_path, obs_prefix='', batch_key='')¶
Reads an .fcs file into an AnnData object.
- Parameters
fcs_file_path (str) – Path to the .fcs file (Required)
obs_prefix (str) – String to append to the index values of the output anndata.AnnData.obs.index (Default=””)
batch_key (str) – If len(batch_key) > 0, this str will be added as a value to a “batch” feature in the returned AnnData.obs Data.Frame (Default=””)
- Returns
An AnnData object with the DATA segment of the .fcs file saved to the X attribute.
- Return type
anndata.AnnData
- pyInfinityFlow.InfinityFlow_Utilities.write_anndata_to_fcs(input_anndata, fcs_file_path, add_umap=False, verbosity=0)¶
Writes a given pyInfinityFlow structured AnnData object to an .fcs file according to the FCS3.1 file standard.
- Parameters
input_anndata (anndata.AnnData) – The pyInfinityFlow formatted AnnData object to save to an .fcs file (Required)
fcs_file_path (str) – The path to which the .fcs file should be written. (Required)
add_umap (bool) – Specifies whether the 2D-UMAP coordinates should be written to the DATA segment of the .fcs file. This expects the features “umap-x” and “umap-y” are in the input_anndata.obs.columns. (Default=False)
verbosity (int (0|1|2|3)) – The level of verbosity with which to print debug statements.
- Returns
The file will be saved to fcs_file_path.
- Return type
None
- pyInfinityFlow.InfinityFlow_Utilities.apply_logicle_to_anndata(input_anndata, in_place=True)¶
Applies the Logicle transformation function to the given input_anndata object.
Note
The T, W, M, and A parameters are specified in the input_anndata.var.
- Parameters
input_anndata (anndata.AnnData) – The pyInfinityFlow formatted AnnData object on which to carry out Logicle normalization (Required)
in_place (bool) – Specifies whether the function should act on the input_anndata in-place (Default=True)
- Returns
The AnnData object with logicle normalization applied or None if in_place=True
- Return type
anndata.AnnData or None
- pyInfinityFlow.InfinityFlow_Utilities.apply_inverse_logicle_to_anndata(input_anndata, in_place=True)¶
Applies the inverse Logicle transformation function to the given input_anndata object.
- Parameters
input_anndata (anndata.AnnData) – The pyInfinityFlow formatted AnnData object on which to carry out Logicle normalization (Required)
in_place (bool) – Specifies whether the function should act on the input_anndata in-place (Default=True)
- Returns
The AnnData object with logicle normalization applied or None if in_place=True
- Return type
anndata.AnnData or None
- pyInfinityFlow.InfinityFlow_Utilities.move_features_to_silent(input_anndata, features)¶
This function will “silence” a set of feature values by moving them out of the AnnData.X array, and move them into a DataFrame stored in the AnnData.obsm[“silent”] key. The DataFrame in AnnData.var corresponding to the features is moved to the AnnData.uns[“silent_var”] key. This is useful when you want to keep some features out of the data for downstream analyses. For example, the “Time” parameter stored in .fcs files is not meaningful to cell state.
- Parameters
input_anndata (anndata.AnnData) – A pyInfinityFlow formatted AnnData object. (Required)
features (list[str]) – The features (must be present in AnnData.var.index) to move to ‘silent’. (Required)
- Returns
A pyInfinityFlow formatted AnnData object. The ‘silent’ feature values are moved to AnnData.obsm[“silent”], and the ‘silent’ feature var DataFrame values are moved to AnnData.uns[“silent_var”]
- Return type
anndata.AnnData
- pyInfinityFlow.InfinityFlow_Utilities.move_features_out_of_silent(input_anndata, features)¶
This function will move the features that were “silenced” by pyInfinityFlow. InfinityFlow_Utilities.move_features_to_silent back into the AnnData.X and AnnData.var values.
It is required that AnnData.obsm[“silent”] and AnnData.uns[“silent_var”] exist.
- Parameters
input_anndata (anndata.AnnData) – A pyInfinityFlow formatted AnnData object. (Required)
features (list[str]) – The features (must be present in AnnData.var.index) to move out of ‘silent’. (Required)
- Returns
A pyInfinityFlow formatted AnnData object. The features values are moved out of silent and back into AnnData.X and AnnData.var.
- Return type
anndata.AnnData
- pyInfinityFlow.InfinityFlow_Utilities.make_pca_elbo_plot(sub_p_adata, output_paths)¶
This function will make a PCA elbo curve plot to show the variance explained by each principal component. Requires that scanpy.tl.pca has been run on the sub_p_adata object.
- Parameters
input_anndata (anndata.AnnData) – A pyInfinityFlow formatted AnnData object that has the sub_p_adata.uns [‘pca’][‘variance’] attribute. (Required)
output_paths (dict) – The output_paths dictionary created by the pyInfinityFlow. InfinityFlow_Utilities.setup_output_directories function (Required)
- Return type
None
InfinityFlow_Utilities: Analysis Pipeline Functions¶
- pyInfinityFlow.InfinityFlow_Utilities.check_infinity_flow_annotation_dataframes(backbone_annotation, infinity_marker_annotation, n_events_train=0, n_events_validate=0, n_events_combine=0, ratio_for_validation=0.2, separate_backbone_reference=None, random_state=None, input_fcs_dir=None, verbosity=0)¶
This function prepares the FileHandler object to control how the pipeline will handle each .fcs file for the indicated regression model. Both the backbone_annotation table and the infinity_marker_annotation table are checked for validity.
- Parameters
backbone_annotation (pandas.DataFrame) – The first column is the backbone features as they appear in the channel names of the fcs file for the reference data. The second column is the channel names as they appear in the query file, which is used to build the regression model. The last column is the final name to give to the user defined channel parameter of fcs file. (Required)
infinity_marker_annotation (pandas.DataFrame) – The first column is the fcs file name. The second column is the channel name in fcs file to use as the response variable in the regression model. The third column is the desired name to give to the final channel in the output. The fourth column, which is optional, is the name of the isotype background control antibody as it appears in the third column.
n_events_train (int) – The number of events in each fcs file that should be considered
n_events_validate (int) – The number of events to use to validate each regression model
n_events_combine (int or None) – If pooling events from each file to merge into a final dataset, this variable specifies how many events from each file will be taken from each file to combine into a final object to use as the reference for regression.
ratio_for_validation (float from 0 to 1) – If n_events_train and n_events_validate are set to 0, then all events from the fcs file will be used and this parameter will specify what ratio of the fcs events will be used for validation. The remainder will be used for training.
random_state (int) – Integer to give for sampling indices from fcs file so that sampling indices from fcs files can be reproduced.
input_fcs_dir (str) – The path to the directory that holds all of the fcs files in column 1 of the infinity_marker_annotation DataFrame
exclusive_train_and_validate (bool) – If true, the program will be forced to use separate events for training and validation, n_events_combine will be taken from validation but cannot be taken from training.
verbosity (int (0|1|2|3)) – Specifies to what verbosity level the function will output progress and debugging statements.
- Returns
An instance of InfinityFlowFileHandler, which is an object to specify how input .fcs files should be treated during the regression pipeline.
- Return type
pyInfinityFlow.InfinityFlow_Utilities.InfinityFlowFileHandler
- pyInfinityFlow.InfinityFlow_Utilities.setup_output_directories(output_dir, file_handler, verbosity=0)¶
Set up the output directories for the InfinityFlow Regression workflow
- Parameters
output_dir (str) – The directory to which the pipeline outputs should be saved. (Required)
file_handler (pyInfinityFlow.InfinityFlow_Utilities.InfinityFlowFileHandler) – The InfinityFlowFileHandler that is returned by pyInfinityFlow. InfinityFlow_Utilities.check_infinity_flow_annotation_dataframes.
verbosity (int (0|1|2|3)) – Specifies to what verbosity level the function will output progress and debugging statements.
- Returns
- A dictionary that stores the output directories as strings:
[“output_regression_path”]
[“output_umap_feature_plot_path”]
[“clustering”]
[“qc”]
[“output_umap_bc_feature_plot_path”]
The function will check if each of the output directory paths can be created and make them if they don’t exist.
- Return type
dict
- pyInfinityFlow.InfinityFlow_Utilities.single_chunk_training(file_handler, cores_to_use=1, random_state=None, xgb_params={}, use_logicle_scaling=True, normalization_method=None, verbosity=0)¶
This function carries out fitting of XGBoost regression models. It will read the data using the file_handler object to specify which events will be used for fitting. It will then carry out optional Logicle data normalization and batch normalization before fitting the model. It will then save the settings of the XGBoost regression models to the output.
- Parameters
file_handler (pyInfinityFlow.InfinityFlow_Utilities.InfinityFlowFileHandler) – The InfinityFlowFileHandler that is returned by pyInfinityFlow. InfinityFlow_Utilities.check_infinity_flow_annotation_dataframes.
cores_to_use (int) – The number of cores to use for XGBoost model fitting. (Default=1)
random_state (int or None) – Integer to specify the random state for XGBoost model fitting in an attempt to make the regression more reproducible, or None to not use a random seed. (Default=None)
xgb_params (dict) – Dictionary of keyword-argument value pairs to pass to the XGBoost model instantiation. (Default={})
use_logicle_scaling (bool) – Whether or not to use Logicle scaling before model fitting. (Default=True)
normalization_method (None or "zscore") – The method for normalizing the backbone of different samples in an attempt to remove batch effects. (Default=None)
verbosity (int (0|1|2|3)) – Specifies to what verbosity level the function will output progress and debugging statements. (Default=0)
- Returns
- pyInfinityFlow.InfinityFlow_Utilities.CombinedRegressionModels
An object to track the state of XGBoost Regression models as well as the models themselves.
- timings_dict
A dictionary that saves how much time each step of function takes.
- Return type
tuple (CombinedRegressionModels, timings_dict)
- pyInfinityFlow.InfinityFlow_Utilities.single_chunk_testing(file_handler, regression_models, use_logicle_scaling=True, normalization_method=None, verbosity=0)¶
This function carries out validation of XGBoost regression models. It will read the data using the file_handler object to specify which events will be used for validation. It will then predict the InfinityMarker signal on held out data from its .fcs file. Then it will save metrics to the regression_models object and return it, along with a dictionary to track timings for steps of the function.
- Parameters
file_handler (pyInfinityFlow.InfinityFlow_Utilities.InfinityFlowFileHandler) – The InfinityFlowFileHandler that is returned by pyInfinityFlow. InfinityFlow_Utilities.check_infinity_flow_annotation_dataframes. (Required)
regression_models (pyInfinityFlow.InfinityFlow_Utilities.CombinedRegressionModels) – The CombinedRegressionModels that is returned by pyInfinityFlow. InfinityFlow_Utilities.single_chunk_training function. (Required)
use_logicle_scaling (bool) – Whether or not to use Logicle scaling before model fitting. (Default=True)
normalization_method (None or "zscore") – The method for normalizing the backbone of different samples in an attempt to remove batch effects. (Default=None)
verbosity (int (0|1|2|3)) – Specifies to what verbosity level the function will output progress and debugging statements. (Default=0)
- Returns
- pyInfinityFlow.InfinityFlow_Utilities.CombinedRegressionModels
An object to track the state of XGBoost Regression models as well as the models themselves. The .validation_metrics attribute will be filled with a dictionary that provides the following validation data:
[“pred”] - predicted values
[“true”] - real values
[“r2_score”] - r2_score provided by sklearn.metrics.r2_score
[“mean_squared_error”] - provided by sklearn.metrics.mean_squared_error
- timings_dict
A dictionary that saves how much time each step of function takes.
- Return type
tuple (CombinedRegressionModels, timings_dict)
- pyInfinityFlow.InfinityFlow_Utilities.make_flow_regression_predictions(file_handler, regression_models, separate_backbone_reference=None, use_logicle_scaling=True, normalization_method=None, verbosity=0)¶
This function carries out prediction using XGBoost regression models. It will use either a separate_backbone_reference .fcs file onto which to make predictions of the InfinityMarker signals, or it will use a subset of the validation cells from the InfinityMarker .fcs files themselves. The output will be an AnnData object containing the backbone features and the predicted signals from the InfinityMarker regression models.
- Parameters
file_handler (pyInfinityFlow.InfinityFlow_Utilities.InfinityFlowFileHandler) – The InfinityFlowFileHandler that is returned by pyInfinityFlow. InfinityFlow_Utilities.check_infinity_flow_annotation_dataframes. (Required)
regression_models (pyInfinityFlow.InfinityFlow_Utilities.CombinedRegressionModels) – The CombinedRegressionModels that is returned by pyInfinityFlow. InfinityFlow_Utilities.single_chunk_training function. (Required)
separate_backbone_reference (str or None) – If not None, this defines the path to the .fcs file onto which to make predictions for the InfinityMarker signals.
use_logicle_scaling (bool) – Whether or not to use Logicle scaling before model fitting. (Default=True)
normalization_method (None or "zscore") – The method for normalizing the backbone of different samples in an attempt to remove batch effects. (Default=None)
verbosity (int (0|1|2|3)) – Specifies to what verbosity level the function will output progress and debugging statements. (Default=0)
- Returns
- AnnData
A pyInfinityFlow formatted AnnData object with the original parameter values as well as the imputed InfinityMarker values.
- timings_dict
A dictionary that saves how much time each step of function takes.
- Return type
tuple (AnnData, timings_dict)
- pyInfinityFlow.InfinityFlow_Utilities.perform_background_correction(sub_p_adata, file_handler, infinity_marker_annotation, cores_to_use=1, verbosity=0)¶
This function carries out background correction on the signal of a given InfinityMarker if that InfinityMarker has a corresponding Isotype InfinityMarker. A linear model is applied to regress-out the background antibody binding from the theoretical true signal of the InfinityMarker.
- Parameters
sub_p_adata (anndata.AnnData) – A pyInfinityFlow formatted AnnData object with the original parameter values as well as the imputed InfinityMarker values. The Isotype controls must be included as InfinityMarkers and annotated in the infinity_marker_annotation DataFrame. (Required)
file_handler (pyInfinityFlow.InfinityFlow_Utilities.InfinityFlowFileHandler) – The InfinityFlowFileHandler that is returned by pyInfinityFlow. InfinityFlow_Utilities.check_infinity_flow_annotation_dataframes. (Required)
infinity_marker_annotation (pandas.DataFrame) – The annotation DataFrame that specifies the File, Channel to predict, Name of final InfinityMarker, and Isotype InfinityMarker Name for each InfinityMarker. This DataFrame must have 4 columns if background correction is to be done. Each of the values in the last column (Isotype) must be present in the third column (Name of InfinityMarker) as InfinityMarkers. (Required)
cores_to_use (int) – The number of cores to use for fitting the sklearn.linear_model. LinearRegression model. (Default=1)
verbosity (int (0|1|2|3)) – Specifies to what verbosity level the function will output progress and debugging statements. (Default=0)
- Returns
- background_corrected_data
A DataFrame specifying the background corrected data, with event names as the index and channel names as columns.
- background_corrected_var
A DataFrame of the .var field that corresponds to the features in the background_corrected_data.
- timings_dict
A dictionary that saves how much time each step of function takes.
- Return type
tuple (background_corrected_data, background_corrected_var, timings_dict)
- pyInfinityFlow.InfinityFlow_Utilities.find_markers_from_anndata(sub_p_adata, output_paths, groups_to_colors, cluster_key='leiden', verbosity=0)¶
Attempts to associate each of the clusters present in the AnnData object with the Backbone and InfinityMarkers in the dataset. It applies MarkerFinder to these clusters, generates a marker table, and plots a heatmap with the clustered events as columns and Markers as rows.
- Parameters
sub_p_adata (anndata.AnnData) – A pyInfinityFlow formatted AnnData object with the original parameter values as well as the imputed InfinityMarker values. Clusters must be defined in the sub_p_adata.obs DataFrame. (Required)
output_paths (dict) – The output_paths dictionary created by the pyInfinityFlow. InfinityFlow_Utilities.setup_output_directories function (Required)
groups_to_colors (dict) – Dictionary to specify what color should be used for each cluster in sub_p_adata.obs[cluster_key]. (Eg. {‘c1’:’red’, ‘c2’: ‘blue’, …}) (Required)
cluster_key (str) – The key in sub_p_adata.obs to use for cluster assignments. By default, it will look for “leiden”. (Default=”leiden”)
verbosity (int (0|1|2|3)) – Specifies to what verbosity level the function will output progress and debugging statements. (Default=0)
- Returns
- markers_df
A DataFrame of which cluster for which each feature is a best marker by Pearson correlation using MarkerFinder. The columns of the DataFrame will be [“marker”, “top_cluster”, “pearson_r”, “p_value”]
- cell_assignments
A DataFrame specifying the top 50 (or fewer if the cluster is smaller) events that correspond to each cluster, ranked by Pearson correlation of each event to its clusters centroid. Contains the following features:
[“cell”] - the event name
[“top_cluster”] - the cluster to which the event best correlates
[“top_corr”] - the Pearson correlation coefficient
[“original”] - the original cluster identity provided
- Return type
tuple (markers_df, cell_assignments)
- pyInfinityFlow.InfinityFlow_Utilities.save_umap_figures_all_features(sub_p_adata, file_handler, output_paths, background_corrected_data=None, verbosity=0)¶
Plots the 2D-UMAP stored in sub_p_adata and colors using each of the feature values in sub_p_adata.var. A .png file will be saved for each feature in the directory specified by output_paths[“output_umap_bc_feature_plot_path”] and/or output_paths[“output_umap_feature_plot_path”].
- Parameters
sub_p_adata (anndata.AnnData) – A pyInfinityFlow formatted AnnData object. Must have ‘umap-x’ and ‘umap-y’ in sub_p_adata.obs.columns (Required)
file_handler (pyInfinityFlow.InfinityFlow_Utilities.InfinityFlowFileHandler) – The InfinityFlowFileHandler that is returned by pyInfinityFlow. InfinityFlow_Utilities.check_infinity_flow_annotation_dataframes. (Required)
output_paths (dict) – The output_paths dictionary created by the pyInfinityFlow. InfinityFlow_Utilities.setup_output_directories function (Required)
background_corrected_data (pandas.DataFrame or None) – The background corrected data generated by pyInfinityFlow. InfinityFlow_Utilities.perform_background_correction. (Default=None)
verbosity (int (0|1|2|3)) – Specifies to what verbosity level the function will output progress and debugging statements. (Default=0)
- Returns
A dictionary that saves how much time each step of function takes.
- Return type
dict
- pyInfinityFlow.InfinityFlow_Utilities.save_fcs_flow_anndata(sub_p_adata, file_handler, output_paths, background_corrected_data=None, background_corrected_var=None, add_umap=False, use_logicle=True, verbosity=0)¶
Save the pyInfinityFlow structured AnnData object to an .fcs file.
- Parameters
sub_p_adata (anndata.AnnData) – A pyInfinityFlow formatted AnnData object with the original parameter values as well as the imputed InfinityMarker values. Clusters must be defined in the sub_p_adata.obs DataFrame. (Required)
file_handler (pyInfinityFlow.InfinityFlow_Utilities.InfinityFlowFileHandler) – The InfinityFlowFileHandler that is returned by pyInfinityFlow. InfinityFlow_Utilities.check_infinity_flow_annotation_dataframes. (Required)
output_paths (dict) – The output_paths dictionary created by the pyInfinityFlow. InfinityFlow_Utilities.setup_output_directories function (Required)
background_corrected_data (pandas.DataFrame or None) – The background corrected data generated by pyInfinityFlow. InfinityFlow_Utilities.perform_background_correction. (Default=None)
background_corrected_var (pandas.DataFrame or None) – The background_corrected_var DataFrame generated by pyInfinityFlow. InfinityFlow_Utilities.perform_background_correction. (Default=None)
add_umap (bool) – If True, will add the ‘umap-x’ and ‘umap-y’ features from sub_p_adata.obs to sub_p_adata.X. Requires that the 2D-UMAP has been generated for sub_p_adata and is specified in the ‘umap-x’ and ‘umap-y’ features of sub_p_adata.obs (Default=False)
use_logicle (bool) – If True, the function will attempt to inver the logicle normalization before the data is saved.
verbosity (int (0|1|2|3)) – Specifies to what verbosity level the function will output progress and debugging statements. (Default=0)
- Returns
A dictionary that saves how much time each step of function takes.
- Return type
dict
Transformations¶
- pyInfinityFlow.Transformations.apply_logicle(x, T=3000000, W=0, M=3, A=1)¶
The logicle scale is the inverse of a modified biexponential function and has the same relation to the modified biexponential function that a logarithmic scale has to its corresponding exponential function. [1]
The logicle uses the modified biexponential function B, according to:
logicle(x, T, W, M, A) = root(B(y, T, W, M, A) - x)
B is the modified biexponential function:
B(y, T, W, M, A) = (ae^(by) - cd^(-dy)) - f
where:
w = W / (M + A) x2 = A / (M + A) x1 = x2 + w x0 = x2 + 2w b = (M + A) * ln(10)
d is a constant so that:
2(ln(d) - ln(b)) + w(d+b) = 0
given b and w:
ca = e^(x0(b+d)) fa = (d^(b * x1)) - (ca / e^(d * x1)) a = T / ((e^b) - f - (c / (e^d))) c = c * a f = f * a
- Parameters
x (list-like numeric vector) – The input vector to normalize with logicle transformation
T (numeric) – The formal “Top of scale” value (Default=3000000)
W (numeric) – (Width parameter) The number of decades in the approximately linear region The choice of W = 0 gives essentially the hyperbolic sine function (sinh x)
M (numeric) – The number of decades that the true logarithmic scale approached at the high end of the logicle scale would cover in the plot range
A (numeric) – Number of Additional decades of negative data values to be included
Note
- Parameters should be chosen so that:
T > 0
M > 0
0 <= W <= M/2
- Returns
The input x after applying the logicle function
- Return type
list-like numeric vector
References
- 1
Moore, Wayne A., and David R. Parks. “Update for the logicle data scale including operational code implementations,” Cytometry. Part A: the journal of the International Society for Analytical Cytology 81.4 (2012): 273.
- pyInfinityFlow.Transformations.apply_inverse_logicle(x, T=3000000, W=0, M=3, A=1)¶
This function inverts pyInfinityFlow.Transformations.apply_logicle
- Parameters
x (list-like numeric vector) – The input vector to invert the logicle transformation
T (numeric) – The formal “Top of scale” value (Default=3000000)
W (numeric) – (Width parameter) The number of decades in the approximately linear region The choice of W = 0 gives essentially the hyperbolic sine function (sinh x)
M (numeric) – The number of decades that the true logarithmic scale approached at the high end of the logicle scale would cover in the plot range
A (numeric) – Number of Additional decades of negative data values to be included
- Returns
The input x after applying the inverse logicle function
- Return type
list-like numeric vector
- pyInfinityFlow.Transformations.scale_feature(input_array, min_threshold_percentile, max_threshold_percentile)¶
Removes outliers and applies MinMaxScaler
This function is designed to remove outliers and fit the distribution into the range (0,1)
- Parameters
input_array (list-like numeric vector) – The feature values to scale
min_threshold_percentile ((number between 0 to 100 inclusive)) – The minimum value for the input domain to be accepted, outliers below the percentile value given by this parameter will take on that minimum value
max_threshold_percentile ((number between 0 to 100 inclusive)) – The maximum value for the input domain to be accepted, outliers above the percentile value given by this parameter will take on the maximum value
- Returns
The input_array after applying the thresholding and min-max scaling
- Return type
list-like numeric vector
fcs_io¶
FCSFileObject Class¶
- class pyInfinityFlow.fcs_io.FCSFileObject(fcs_file_path='', mode='r', read_data_segment=True)¶
Primary class for working with FCS files.
This class is used to read and write FCS files. A mode is specified to either read from or write to the given fcs_file_path. Reading of FCS files can be done without including the DATA segment, so that the HEADER and TEXT segments can be read quickly.
Warning
Currently only FCS3.1 files are supported.
- Parameters
fcs_file_path (str) – The path to the FCS file. (Required)
mode (str (Epects 'r'|'w')) – The mode in which to treat the FCS file. If ‘r’, the class instance will read from the FCS file immediately after it is created. (Default=’r’)
read_data_segment (bool) – Whether or not to read in the DATA segment of the FCS file. If false, this allows you to read in the HEADER and TEXT segment values into the class to learn important properties from the FCS file (Eg. The number of events captured, the channel names, etc.) (Default=True)
- file_path¶
The path to the FCS file. Set by fcs_file_path
- Type
str
- byte_locations¶
- The binary positions of the files marking the different segments. This will be filled when mode=’r’ upon instantiation with the following keys:
[“text_start”]
[“text_end”]
[“data_start”]
[“data_end”]
[“analysis_start”]
[“analysis_end”]
- Type
dict{KEY: int}
- version¶
The version of the .fcs file (Eg. ‘FCS3.1’)
- Type
str
- text_segment¶
The TEXT segment as a string.
- Type
str
- delimiter¶
This is the character used as the delimiter between items in the TEXT segment
- Type
str
- byteord_format¶
The byte order format to use to read the DATA segment
- Type
str
- text_segment_values¶
A dictionary that stores the FCS file TEXT segment key-value entries. These are important for defining properties about the channels, file positions, experiment annotations, etc.
- Type
dict{KEY: str}
- spillover¶
The spillover matrix to use for compensation
- Type
pandas.DataFrame
- data¶
The data from the DATA segment of the FCS file
- Type
pandas.DataFrame
- struct_format_string¶
Struct format string to pack and unpack the DATA segment as binary
- Type
str
- par_count¶
The number of parameters in the FCSFileObject
- Type
int
- list_par_n¶
Ordered list of parameters by number
- Type
list[int]
- named_par¶
The $PnS names, usually defined by the user when the FCS data is captured
- Type
list[str]
- named_par_channel¶
The $PnN channel names, these must be unique and are generally defined by the software used to capture the FCS events (Eg. “GFP-A”)
- Type
list[str]
- load_data_from_pd_df(input_pd_df, input_channel_names=[], input_spillover_matrix=None, additional_text_segment_values={})¶
Load data from a pandas.DataFrame into an FCSFileObject
This method allows the FCSFileObject to add data to the DATA segment from a pandas.DataFrame. The resulting data can then be written to an FCS file using the FCSFileObject.to_fcs() method
- Parameters
input_pd_df (pandas.DataFrame) – The DATA to add to the DATA segment. (Required)
input_channel_names (list[str]) – These are the names to give to the $PnS TEXT segment key-value. The list must be the same size as input_pd_df.shape[1] (The columns of the input_pd_df) or be empty (Default=[])
input_spillover_matrix (None or pandas.DataFrame) – The spillover matrix for fluorescence compensation (Default=None)
additional_text_segment_values (dict{'str':'str'}) – Key-value pairs to add to the TEXT segment of the FCS file. (Default={})
- Returns
Adds the given information to the FCSFileObject
- Return type
None
- read_fcs()¶
Read data from an FCS file into the FCSFileObject instance
This is useful if a shallow read of the FCS file was initially performed without reading in the DATA segment. This function will read in the data from the FCS file.
- Returns
Adds the data from the DATA segment of the fcs_file to the FCSFileObject
- Return type
None
- to_fcs(fcs_file_path, add_spillover_matrix=False)¶
Write FCSFileObject to an FCS file
- Parameters
fcs_file_path (str) – The path to where the FCS file should be written. (Required)
add_spillover_matrix (bool) – Specifies whether or not to add the Spillover matrix defined in the FCSFileObject.spillover attribute to the TEXT segment of the FCS file. (Default=False)
- Returns
Attempts to write the FCS file.
- Return type
None
fcs_io: Functions¶
- pyInfinityFlow.fcs_io.list_fcs_channels(fcs_file_path, add_user_defined_names=False)¶
List the channel names defined in a given FCS file
This is useful for getting the channel names, exactly as they are written, in the FCS file. It is important when defining the Reference and Query channels in the backbone_annotation file, or the Target channel name in the InfinityMarker annotation file that they match the names from the channels in the specified FCS files.
Note
“PnS” is the key used in an FCS file to specify how the user wanted to annotate the channel.
“PnN” is the key used in an FCS file to specify the name for the channel and each must be unique in a given FCS file
- Parameters
fcs_file_path (str) – Path to the .fcs file (Required)
add_user_defined_names (bool) – If True, the function will put the user-defined (“PnS”) names as a second column with the unique channel (“PnN”) names as the first column. (Default=False)
- Returns
Prints out the channel names as a table to stdout.
- Return type
None
Plotting_Utilities¶
- pyInfinityFlow.Plotting_Utilities.assign_rainbow_colors_to_groups(groups)¶
Creates a dictionary of cluster names to hexadecimal color strings
This function takes a list of groups and assigns each unique item in the groups a color (using the matplotlib.cm.rainbow color-map) as a hexadecimal string value. This is useful for storing a single color scheme for clusters to be used with downstream visualizations.
- Parameters
groups (numpy.Array[str]) – List of cluster names. (Required)
- Returns
dict {str – Dictionary of cluster-names to assigned colors (hexadecimal value)
- Return type
str}
- pyInfinityFlow.Plotting_Utilities.plot_feature_over_x_y_coordinates_and_save_fig(feature_vector, x, y, feature_name, file_path)¶
Plots a 2D-scatter plot of numeric vector over x and y coordinates
This function takes a feature_vector, x and y coordinates, a feature_name, and a file_path and plots a scatterplot of all points, coloring the points using the “jet” colormap in matplotlib following the feature_vector scale.
Warning
It is expected that feature_vector, x, and y correspond to the same events, in the same order.
Note
The colormap will start at the 20th percentile (~blue) and end at the 80th percentile (~red) of the feature vector.
- Parameters
feature_vector (numpy.Array[numeric]) – Numeric values to map the ‘jet’ colormap onto in the scatter plot. (Required)
x (numpy.Array[numeric]) – Numeric values for the x-coordinate of the scatter plot (Required)
y (numpy.Array[numeric]) – Numeric values for the y-coordinate of the scatter plot (Required)
feature_name (str) – Label to give the colorbar and plt.title of the scatter plot. (Required)
file_path (str) – The path to save the figure. (Required)
- Returns
Saves the scatterplot to the file specified by file_path
- Return type
None
- pyInfinityFlow.Plotting_Utilities.plot_markers_df(input_df, ordered_markers_df, ordered_cells_df, groups_to_colors, path_to_save_figure)¶
Plots a heatmap of the MarkerFinder results
This function takes a pandas.DataFrame of values, a markers_df and cell_assignments from pyInfinityFlow.InfinityFlow_Utilities. find_markers_from_anndata to plot a heatmap of the markers.
Note
This function expects pyInfinityFlow.InfinityFlow_Utilities. find_markers_from_anndata to have already been run.
- Parameters
input_df (pandas.DataFrame) – Data to plot. The columns must intersect with features in the ordered_markers_df and the rows must intersect with the cells in (Required)
ordered_markers_df (pandas.DataFrame) – The markers_df output from pyInfinityFlow.InfinityFlow_Utilities. find_markers_from_anndata (Required)
ordered_cells_df (pandas.DataFrame) – The cell_assignments output from pyInfinityFlow.InfinityFlow_Utilities. find_markers_from_anndata (Required)
groups_to_colors (dict {str:str}) – Dictionary of cluster-names to assigned colors (hexadecimal value) The pyInfinityFlow.Plotting_Utilities.assign_rainbow_colors_to_groups can be used to generate this dictionary from a list of clusters. (Required)
path_to_save_figure (str) – The path to save the figure. (Required)
- Returns
Saves the heatmap to the file specified by path_to_save_figure
- Return type
None
- pyInfinityFlow.Plotting_Utilities.plot_leiden_clusters_over_umap(sub_p_adata, output_paths, verbosity)¶
Plots a 2D-UMAP colored by the values in the “leiden” field
This function takes a pandas.DataFrame of values, a markers_df and cell_assignments from pyInfinityFlow.InfinityFlow_Utilities. find_markers_from_anndata to plot a heatmap of the markers.
Note
It is expected that scanpy.pp.neighbors, scanpy.tl.umap, and scanpy.tl. leiden have been run on sub_p_adata. Also the x,y coordinates of the UMAP must have been added to the sub_p_adata.obs pandas.DataFrame.
- Parameters
sub_p_adata (anndata.AnnData) –
pyInfinityFlow formatted AnnData object. It is expected that the object have the following attributes present:
sub_p_adata.obs[‘umap-x’] : the x-coordinates of the UMAP plot are required to be in the sub_p_adata.obs pandas.DataFrame
sub_p_adata.obs[‘umap-y’] : the y-coordinates of the UMAP plot are required to be in the sub_p_adata.obs pandas.DataFrame
sub_p_adata.obs[‘leiden’] : leiden cluster assignments are required to be in the sub_p_adata.obs pandas.DataFrame
sub_p_adata.uns[‘groups_to_color’] : (dict{str:str}) Dictionary of cluster-names to assigned colors (hexadecimal value) (Required)
output_paths (dict) – The output_paths dictionary created by the pyInfinityFlow. InfinityFlow_Utilities.setup_output_directories function (Required)
verbosity (int (0|1|2|3)) – Specifies to what verbosity level the function will output progress and debugging statements. (Default=0)
- Returns
Saves the scatterplot to the file specified by output_paths[“clustering”]
- Return type
None