API

The pyInfinityFlow API is designed to give the user more control over what parameters are used in the InfinityFlow pipeline. It also allows for any FCS file to be processed with any step of the pipeline.

InfinityFlow_Utilities

InfinityFlow_Utilities: Classes

class pyInfinityFlow.InfinityFlow_Utilities.InfinityFlowFileHandler(ordered_reference_backbone)

Class to specify how to handle InfinityMarker files.

Parameters

ordered_reference_backbone (numpy.Array[str]) – Array of backbone channel names

list_infinity_markers

List of the InfinityMarker names that the object can handle

Type

list[str]

handles

A dictionary with a key for every InfinityMarker. Each key stores a dictionary with the following values to specify how to handle the given InfinityMarker:

  • [“name”]: (str) InfinityMarker name

  • [“file_name”]: (str) .fcs file name for the InfinityMarker

  • [“directory”]: (str) path to the directory where the .fcs file is saved

  • [“reference_backbone_channels”]: (list[str]) list of the channel names to use for the backbone in the reference .fcs file (the events used for prediction)

  • [“backbone_channels”]: (list[str]) list of the channel names to use for the backbone in the reference .fcs file (the events used for XGBoost regression model fitting)

  • [“prediction_channel”]: (str) channel name of the InfinityMarker, the channel name to predict

  • [“train_indices”]: (numpy.Array[int]) indices of the InfinityMarker .fcs file to use for fitting

  • [“test_indices”]: (numpy.Array[int]) indices of the InfinityMarker .fcs file to use for validation

  • [“pool_indices”]: (numpy.Array[int]) indices of the InfinityMarker .fcs file to use for pooling into the reference to use

Type

dict{dicts}

use_isotype_controls

If True, pipeline functions will require Isotype controls

Type

bool

isotype_control_names

Array of InfinityMarker names

Type

numpy.Array[str]

ordered_reference_backbone

Array of backbone channel names

Type

numpy.Array[str]

add_handle(name, file_name, directory, reference_backbone_channels, backbone_channels, prediction_channel, train_indices, test_indices, pool_indices)

Add a new InfinityMarker handle to the InfinityFlowFileHandler

Parameters
  • name (str) – The name of the InfinityMarker (Required)

  • file_name (str) – The .fcs file name for the InfinityMarker (Required)

  • directory (str) – The path to the directory where the .fcs file is saved (Required)

  • reference_backbone_channels (list[str]) – list of the channel names to use for the backbone in the reference .fcs file (the events used for prediction)

  • backbone_channels (list[str]) – list of the channel names to use for the backbone in the reference .fcs file (the events used for XGBoost regression model fitting)

  • prediction_channel (str) – The channel name of the InfinityMarker, the channel name to predict

  • train_indices (numpy.Array[int]) – The indices of the InfinityMarker .fcs file to use for fitting

  • test_indices (numpy.Array[int]) – The indices of the InfinityMarker .fcs file to use for validation

  • pool_indices (numpy.Array[int]) – The indices of the InfinityMarker .fcs file to use for pooling into the reference to use

Returns

Adds the given InfinityMarker handle to InfinityFlowFileHandler, where handles is a dictionary, and each entry, named by the InfinityMarker name is a dictionary with the following keys:
  • [“name”]

  • [“file_name”]

  • [“directory”]

  • [“reference_backbone_channels”]

  • [“backbone_channels”]

  • [“prediction_channel”]

  • [“train_indices”]

  • [“test_indices”]

  • [“pool_indices”]

Return type

None

class pyInfinityFlow.InfinityFlow_Utilities.CombinedRegressionModels(ordered_training_channels, var_annotations, infinity_markers, regression_models, parameter_annotations, infinity_channels)

Class to store XGBoost regression models, the settings used to fit the model, and the validation metrics from testing.

ordered_training_channels

The features used to train each of the regression models

Type

numpy.Array[str]

var_annotations

The feature parameters for the training features of backbone

Type

pandas.DataFrame

infinity_markers

The response variables the regression models can predict

Type

numpy.Array[str]

regression_models

Dictionary of response variable to regression model for prediction

Type

dict{InfinityMarker: xgboost.XGBRegressor}

parameter_annotations

Dictionary of Series to specify the feature parameter (was logicle applied to the response varialble)

Type

dict{InfinityMarker: Series}

infinity_channels

The channel name for the InfinityMarker (“Response Variable”)

Type

dict{InfinityMarker: str}

validation_metrics

Provide validation metrics as an object with each InfinityMarker as a key

Type

dict{InfinityMarker: dict}


InfinityFlow_Utilities: General Tools

pyInfinityFlow.InfinityFlow_Utilities.read_annotation_table(input_file)

Read in an annotation file. Annotation files are used to dictate how to carry out the regression models.

Parameters
  • input_file (str) – The path to the file containing the annotation information. (Should be either comma separated (.csv) or tab separated (.tsv or .txt)) (Required)

  • use_raw_feature_names (bool) – Optional argument. If True, only use the raw feature names from input_anndata.var.index. If False, add the “name” values for the features in input_anndata.var.index, formatted as <index>:<name>. (Default: True)

  • add_index_names (bool) – Optional argument. If True, will add the input_anndata.obs.index as the index of the returned DataFrame. If False, the index will simply be the integers from range(len(input_anndata.obs.shape[0]))

Returns

DataFrame of the annotation table

Return type

pandas.DataFrame

pyInfinityFlow.InfinityFlow_Utilities.anndata_to_df(input_anndata, use_raw_feature_names=True, add_index_names=True)

Function to quickly convert an AnnData object containing pyInfinityFlow formatted flow cytometry data to a pandas DataFrame object

Parameters
  • input_anndata (anndata.AnnData) – AnnData object for which to generate a DataFrame (Required)

  • use_raw_feature_names (bool) – Optional argument. If True, only use the raw feature names from input_anndata.var.index. If False, add the “name” values for the features in input_anndata.var.index, formatted as <index>:<name>. (Default: True)

  • add_index_names (bool) – Optional argument. If True, will add the input_anndata.obs.index as the index of the returned DataFrame. If False, the index will simply be the integers from range(len(input_anndata.obs.shape[0]))

Returns

DataFrame of the AnnData object’s X attribute

Return type

pandas.DataFrame

pyInfinityFlow.InfinityFlow_Utilities.marker_finder(input_df, groups)

Function to find which features in input_df correspond best to which groups annotating the observations in input_df. The function will perform a Pearson correlation of the input_df feature values to an “idealized” group specific expression vector, where each observation in a given group is set to a value of 1, and the observations in other groups are set to 0.

Parameters
  • input_df (pandas.DataFrame) – DataFrame with observations as index and features as columns (Required)

  • groups (list[str]) – List-like of specified groups corresponding to observations from the input_df. The order of groups should match the order in input_df.index (Required)

Returns

DataFrame of the Pearson correlation test results. Each feature is assigned the cluster for which the test resulted in the highest Pearson correlation coefficient. The columns of the DataFrame will be [“marker”, “top_cluster”, “pearson_r”, “p_value”]

Return type

pandas.DataFrame

pyInfinityFlow.InfinityFlow_Utilities.read_fcs_into_anndata(fcs_file_path, obs_prefix='', batch_key='')

Reads an .fcs file into an AnnData object.

Parameters
  • fcs_file_path (str) – Path to the .fcs file (Required)

  • obs_prefix (str) – String to append to the index values of the output anndata.AnnData.obs.index (Default=””)

  • batch_key (str) – If len(batch_key) > 0, this str will be added as a value to a “batch” feature in the returned AnnData.obs Data.Frame (Default=””)

Returns

An AnnData object with the DATA segment of the .fcs file saved to the X attribute.

Return type

anndata.AnnData

pyInfinityFlow.InfinityFlow_Utilities.write_anndata_to_fcs(input_anndata, fcs_file_path, add_umap=False, verbosity=0)

Writes a given pyInfinityFlow structured AnnData object to an .fcs file according to the FCS3.1 file standard.

Parameters
  • input_anndata (anndata.AnnData) – The pyInfinityFlow formatted AnnData object to save to an .fcs file (Required)

  • fcs_file_path (str) – The path to which the .fcs file should be written. (Required)

  • add_umap (bool) – Specifies whether the 2D-UMAP coordinates should be written to the DATA segment of the .fcs file. This expects the features “umap-x” and “umap-y” are in the input_anndata.obs.columns. (Default=False)

  • verbosity (int (0|1|2|3)) – The level of verbosity with which to print debug statements.

Returns

The file will be saved to fcs_file_path.

Return type

None

pyInfinityFlow.InfinityFlow_Utilities.apply_logicle_to_anndata(input_anndata, in_place=True)

Applies the Logicle transformation function to the given input_anndata object.

Note

The T, W, M, and A parameters are specified in the input_anndata.var.

Parameters
  • input_anndata (anndata.AnnData) – The pyInfinityFlow formatted AnnData object on which to carry out Logicle normalization (Required)

  • in_place (bool) – Specifies whether the function should act on the input_anndata in-place (Default=True)

Returns

The AnnData object with logicle normalization applied or None if in_place=True

Return type

anndata.AnnData or None

pyInfinityFlow.InfinityFlow_Utilities.apply_inverse_logicle_to_anndata(input_anndata, in_place=True)

Applies the inverse Logicle transformation function to the given input_anndata object.

Parameters
  • input_anndata (anndata.AnnData) – The pyInfinityFlow formatted AnnData object on which to carry out Logicle normalization (Required)

  • in_place (bool) – Specifies whether the function should act on the input_anndata in-place (Default=True)

Returns

The AnnData object with logicle normalization applied or None if in_place=True

Return type

anndata.AnnData or None

pyInfinityFlow.InfinityFlow_Utilities.move_features_to_silent(input_anndata, features)

This function will “silence” a set of feature values by moving them out of the AnnData.X array, and move them into a DataFrame stored in the AnnData.obsm[“silent”] key. The DataFrame in AnnData.var corresponding to the features is moved to the AnnData.uns[“silent_var”] key. This is useful when you want to keep some features out of the data for downstream analyses. For example, the “Time” parameter stored in .fcs files is not meaningful to cell state.

Parameters
  • input_anndata (anndata.AnnData) – A pyInfinityFlow formatted AnnData object. (Required)

  • features (list[str]) – The features (must be present in AnnData.var.index) to move to ‘silent’. (Required)

Returns

A pyInfinityFlow formatted AnnData object. The ‘silent’ feature values are moved to AnnData.obsm[“silent”], and the ‘silent’ feature var DataFrame values are moved to AnnData.uns[“silent_var”]

Return type

anndata.AnnData

pyInfinityFlow.InfinityFlow_Utilities.move_features_out_of_silent(input_anndata, features)

This function will move the features that were “silenced” by pyInfinityFlow. InfinityFlow_Utilities.move_features_to_silent back into the AnnData.X and AnnData.var values.

It is required that AnnData.obsm[“silent”] and AnnData.uns[“silent_var”] exist.

Parameters
  • input_anndata (anndata.AnnData) – A pyInfinityFlow formatted AnnData object. (Required)

  • features (list[str]) – The features (must be present in AnnData.var.index) to move out of ‘silent’. (Required)

Returns

A pyInfinityFlow formatted AnnData object. The features values are moved out of silent and back into AnnData.X and AnnData.var.

Return type

anndata.AnnData

pyInfinityFlow.InfinityFlow_Utilities.make_pca_elbo_plot(sub_p_adata, output_paths)

This function will make a PCA elbo curve plot to show the variance explained by each principal component. Requires that scanpy.tl.pca has been run on the sub_p_adata object.

Parameters
  • input_anndata (anndata.AnnData) – A pyInfinityFlow formatted AnnData object that has the sub_p_adata.uns [‘pca’][‘variance’] attribute. (Required)

  • output_paths (dict) – The output_paths dictionary created by the pyInfinityFlow. InfinityFlow_Utilities.setup_output_directories function (Required)

Return type

None


InfinityFlow_Utilities: Analysis Pipeline Functions

pyInfinityFlow.InfinityFlow_Utilities.check_infinity_flow_annotation_dataframes(backbone_annotation, infinity_marker_annotation, n_events_train=0, n_events_validate=0, n_events_combine=0, ratio_for_validation=0.2, separate_backbone_reference=None, random_state=None, input_fcs_dir=None, verbosity=0)

This function prepares the FileHandler object to control how the pipeline will handle each .fcs file for the indicated regression model. Both the backbone_annotation table and the infinity_marker_annotation table are checked for validity.

Parameters
  • backbone_annotation (pandas.DataFrame) – The first column is the backbone features as they appear in the channel names of the fcs file for the reference data. The second column is the channel names as they appear in the query file, which is used to build the regression model. The last column is the final name to give to the user defined channel parameter of fcs file. (Required)

  • infinity_marker_annotation (pandas.DataFrame) – The first column is the fcs file name. The second column is the channel name in fcs file to use as the response variable in the regression model. The third column is the desired name to give to the final channel in the output. The fourth column, which is optional, is the name of the isotype background control antibody as it appears in the third column.

  • n_events_train (int) – The number of events in each fcs file that should be considered

  • n_events_validate (int) – The number of events to use to validate each regression model

  • n_events_combine (int or None) – If pooling events from each file to merge into a final dataset, this variable specifies how many events from each file will be taken from each file to combine into a final object to use as the reference for regression.

  • ratio_for_validation (float from 0 to 1) – If n_events_train and n_events_validate are set to 0, then all events from the fcs file will be used and this parameter will specify what ratio of the fcs events will be used for validation. The remainder will be used for training.

  • random_state (int) – Integer to give for sampling indices from fcs file so that sampling indices from fcs files can be reproduced.

  • input_fcs_dir (str) – The path to the directory that holds all of the fcs files in column 1 of the infinity_marker_annotation DataFrame

  • exclusive_train_and_validate (bool) – If true, the program will be forced to use separate events for training and validation, n_events_combine will be taken from validation but cannot be taken from training.

  • verbosity (int (0|1|2|3)) – Specifies to what verbosity level the function will output progress and debugging statements.

Returns

An instance of InfinityFlowFileHandler, which is an object to specify how input .fcs files should be treated during the regression pipeline.

Return type

pyInfinityFlow.InfinityFlow_Utilities.InfinityFlowFileHandler

pyInfinityFlow.InfinityFlow_Utilities.setup_output_directories(output_dir, file_handler, verbosity=0)

Set up the output directories for the InfinityFlow Regression workflow

Parameters
  • output_dir (str) – The directory to which the pipeline outputs should be saved. (Required)

  • file_handler (pyInfinityFlow.InfinityFlow_Utilities.InfinityFlowFileHandler) – The InfinityFlowFileHandler that is returned by pyInfinityFlow. InfinityFlow_Utilities.check_infinity_flow_annotation_dataframes.

  • verbosity (int (0|1|2|3)) – Specifies to what verbosity level the function will output progress and debugging statements.

Returns

A dictionary that stores the output directories as strings:
  • [“output_regression_path”]

  • [“output_umap_feature_plot_path”]

  • [“clustering”]

  • [“qc”]

  • [“output_umap_bc_feature_plot_path”]

The function will check if each of the output directory paths can be created and make them if they don’t exist.

Return type

dict

pyInfinityFlow.InfinityFlow_Utilities.single_chunk_training(file_handler, cores_to_use=1, random_state=None, xgb_params={}, use_logicle_scaling=True, normalization_method=None, verbosity=0)

This function carries out fitting of XGBoost regression models. It will read the data using the file_handler object to specify which events will be used for fitting. It will then carry out optional Logicle data normalization and batch normalization before fitting the model. It will then save the settings of the XGBoost regression models to the output.

Parameters
  • file_handler (pyInfinityFlow.InfinityFlow_Utilities.InfinityFlowFileHandler) – The InfinityFlowFileHandler that is returned by pyInfinityFlow. InfinityFlow_Utilities.check_infinity_flow_annotation_dataframes.

  • cores_to_use (int) – The number of cores to use for XGBoost model fitting. (Default=1)

  • random_state (int or None) – Integer to specify the random state for XGBoost model fitting in an attempt to make the regression more reproducible, or None to not use a random seed. (Default=None)

  • xgb_params (dict) – Dictionary of keyword-argument value pairs to pass to the XGBoost model instantiation. (Default={})

  • use_logicle_scaling (bool) – Whether or not to use Logicle scaling before model fitting. (Default=True)

  • normalization_method (None or "zscore") – The method for normalizing the backbone of different samples in an attempt to remove batch effects. (Default=None)

  • verbosity (int (0|1|2|3)) – Specifies to what verbosity level the function will output progress and debugging statements. (Default=0)

Returns

pyInfinityFlow.InfinityFlow_Utilities.CombinedRegressionModels

An object to track the state of XGBoost Regression models as well as the models themselves.

timings_dict

A dictionary that saves how much time each step of function takes.

Return type

tuple (CombinedRegressionModels, timings_dict)

pyInfinityFlow.InfinityFlow_Utilities.single_chunk_testing(file_handler, regression_models, use_logicle_scaling=True, normalization_method=None, verbosity=0)

This function carries out validation of XGBoost regression models. It will read the data using the file_handler object to specify which events will be used for validation. It will then predict the InfinityMarker signal on held out data from its .fcs file. Then it will save metrics to the regression_models object and return it, along with a dictionary to track timings for steps of the function.

Parameters
  • file_handler (pyInfinityFlow.InfinityFlow_Utilities.InfinityFlowFileHandler) – The InfinityFlowFileHandler that is returned by pyInfinityFlow. InfinityFlow_Utilities.check_infinity_flow_annotation_dataframes. (Required)

  • regression_models (pyInfinityFlow.InfinityFlow_Utilities.CombinedRegressionModels) – The CombinedRegressionModels that is returned by pyInfinityFlow. InfinityFlow_Utilities.single_chunk_training function. (Required)

  • use_logicle_scaling (bool) – Whether or not to use Logicle scaling before model fitting. (Default=True)

  • normalization_method (None or "zscore") – The method for normalizing the backbone of different samples in an attempt to remove batch effects. (Default=None)

  • verbosity (int (0|1|2|3)) – Specifies to what verbosity level the function will output progress and debugging statements. (Default=0)

Returns

pyInfinityFlow.InfinityFlow_Utilities.CombinedRegressionModels

An object to track the state of XGBoost Regression models as well as the models themselves. The .validation_metrics attribute will be filled with a dictionary that provides the following validation data:

  • [“pred”] - predicted values

  • [“true”] - real values

  • [“r2_score”] - r2_score provided by sklearn.metrics.r2_score

  • [“mean_squared_error”] - provided by sklearn.metrics.mean_squared_error

timings_dict

A dictionary that saves how much time each step of function takes.

Return type

tuple (CombinedRegressionModels, timings_dict)

pyInfinityFlow.InfinityFlow_Utilities.make_flow_regression_predictions(file_handler, regression_models, separate_backbone_reference=None, use_logicle_scaling=True, normalization_method=None, verbosity=0)

This function carries out prediction using XGBoost regression models. It will use either a separate_backbone_reference .fcs file onto which to make predictions of the InfinityMarker signals, or it will use a subset of the validation cells from the InfinityMarker .fcs files themselves. The output will be an AnnData object containing the backbone features and the predicted signals from the InfinityMarker regression models.

Parameters
  • file_handler (pyInfinityFlow.InfinityFlow_Utilities.InfinityFlowFileHandler) – The InfinityFlowFileHandler that is returned by pyInfinityFlow. InfinityFlow_Utilities.check_infinity_flow_annotation_dataframes. (Required)

  • regression_models (pyInfinityFlow.InfinityFlow_Utilities.CombinedRegressionModels) – The CombinedRegressionModels that is returned by pyInfinityFlow. InfinityFlow_Utilities.single_chunk_training function. (Required)

  • separate_backbone_reference (str or None) – If not None, this defines the path to the .fcs file onto which to make predictions for the InfinityMarker signals.

  • use_logicle_scaling (bool) – Whether or not to use Logicle scaling before model fitting. (Default=True)

  • normalization_method (None or "zscore") – The method for normalizing the backbone of different samples in an attempt to remove batch effects. (Default=None)

  • verbosity (int (0|1|2|3)) – Specifies to what verbosity level the function will output progress and debugging statements. (Default=0)

Returns

AnnData

A pyInfinityFlow formatted AnnData object with the original parameter values as well as the imputed InfinityMarker values.

timings_dict

A dictionary that saves how much time each step of function takes.

Return type

tuple (AnnData, timings_dict)

pyInfinityFlow.InfinityFlow_Utilities.perform_background_correction(sub_p_adata, file_handler, infinity_marker_annotation, cores_to_use=1, verbosity=0)

This function carries out background correction on the signal of a given InfinityMarker if that InfinityMarker has a corresponding Isotype InfinityMarker. A linear model is applied to regress-out the background antibody binding from the theoretical true signal of the InfinityMarker.

Parameters
  • sub_p_adata (anndata.AnnData) – A pyInfinityFlow formatted AnnData object with the original parameter values as well as the imputed InfinityMarker values. The Isotype controls must be included as InfinityMarkers and annotated in the infinity_marker_annotation DataFrame. (Required)

  • file_handler (pyInfinityFlow.InfinityFlow_Utilities.InfinityFlowFileHandler) – The InfinityFlowFileHandler that is returned by pyInfinityFlow. InfinityFlow_Utilities.check_infinity_flow_annotation_dataframes. (Required)

  • infinity_marker_annotation (pandas.DataFrame) – The annotation DataFrame that specifies the File, Channel to predict, Name of final InfinityMarker, and Isotype InfinityMarker Name for each InfinityMarker. This DataFrame must have 4 columns if background correction is to be done. Each of the values in the last column (Isotype) must be present in the third column (Name of InfinityMarker) as InfinityMarkers. (Required)

  • cores_to_use (int) – The number of cores to use for fitting the sklearn.linear_model. LinearRegression model. (Default=1)

  • verbosity (int (0|1|2|3)) – Specifies to what verbosity level the function will output progress and debugging statements. (Default=0)

Returns

background_corrected_data

A DataFrame specifying the background corrected data, with event names as the index and channel names as columns.

background_corrected_var

A DataFrame of the .var field that corresponds to the features in the background_corrected_data.

timings_dict

A dictionary that saves how much time each step of function takes.

Return type

tuple (background_corrected_data, background_corrected_var, timings_dict)

pyInfinityFlow.InfinityFlow_Utilities.find_markers_from_anndata(sub_p_adata, output_paths, groups_to_colors, cluster_key='leiden', verbosity=0)

Attempts to associate each of the clusters present in the AnnData object with the Backbone and InfinityMarkers in the dataset. It applies MarkerFinder to these clusters, generates a marker table, and plots a heatmap with the clustered events as columns and Markers as rows.

Parameters
  • sub_p_adata (anndata.AnnData) – A pyInfinityFlow formatted AnnData object with the original parameter values as well as the imputed InfinityMarker values. Clusters must be defined in the sub_p_adata.obs DataFrame. (Required)

  • output_paths (dict) – The output_paths dictionary created by the pyInfinityFlow. InfinityFlow_Utilities.setup_output_directories function (Required)

  • groups_to_colors (dict) – Dictionary to specify what color should be used for each cluster in sub_p_adata.obs[cluster_key]. (Eg. {‘c1’:’red’, ‘c2’: ‘blue’, …}) (Required)

  • cluster_key (str) – The key in sub_p_adata.obs to use for cluster assignments. By default, it will look for “leiden”. (Default=”leiden”)

  • verbosity (int (0|1|2|3)) – Specifies to what verbosity level the function will output progress and debugging statements. (Default=0)

Returns

markers_df

A DataFrame of which cluster for which each feature is a best marker by Pearson correlation using MarkerFinder. The columns of the DataFrame will be [“marker”, “top_cluster”, “pearson_r”, “p_value”]

cell_assignments

A DataFrame specifying the top 50 (or fewer if the cluster is smaller) events that correspond to each cluster, ranked by Pearson correlation of each event to its clusters centroid. Contains the following features:

  • [“cell”] - the event name

  • [“top_cluster”] - the cluster to which the event best correlates

  • [“top_corr”] - the Pearson correlation coefficient

  • [“original”] - the original cluster identity provided

Return type

tuple (markers_df, cell_assignments)

pyInfinityFlow.InfinityFlow_Utilities.save_umap_figures_all_features(sub_p_adata, file_handler, output_paths, background_corrected_data=None, verbosity=0)

Plots the 2D-UMAP stored in sub_p_adata and colors using each of the feature values in sub_p_adata.var. A .png file will be saved for each feature in the directory specified by output_paths[“output_umap_bc_feature_plot_path”] and/or output_paths[“output_umap_feature_plot_path”].

Parameters
  • sub_p_adata (anndata.AnnData) – A pyInfinityFlow formatted AnnData object. Must have ‘umap-x’ and ‘umap-y’ in sub_p_adata.obs.columns (Required)

  • file_handler (pyInfinityFlow.InfinityFlow_Utilities.InfinityFlowFileHandler) – The InfinityFlowFileHandler that is returned by pyInfinityFlow. InfinityFlow_Utilities.check_infinity_flow_annotation_dataframes. (Required)

  • output_paths (dict) – The output_paths dictionary created by the pyInfinityFlow. InfinityFlow_Utilities.setup_output_directories function (Required)

  • background_corrected_data (pandas.DataFrame or None) – The background corrected data generated by pyInfinityFlow. InfinityFlow_Utilities.perform_background_correction. (Default=None)

  • verbosity (int (0|1|2|3)) – Specifies to what verbosity level the function will output progress and debugging statements. (Default=0)

Returns

A dictionary that saves how much time each step of function takes.

Return type

dict

pyInfinityFlow.InfinityFlow_Utilities.save_fcs_flow_anndata(sub_p_adata, file_handler, output_paths, background_corrected_data=None, background_corrected_var=None, add_umap=False, use_logicle=True, verbosity=0)

Save the pyInfinityFlow structured AnnData object to an .fcs file.

Parameters
  • sub_p_adata (anndata.AnnData) – A pyInfinityFlow formatted AnnData object with the original parameter values as well as the imputed InfinityMarker values. Clusters must be defined in the sub_p_adata.obs DataFrame. (Required)

  • file_handler (pyInfinityFlow.InfinityFlow_Utilities.InfinityFlowFileHandler) – The InfinityFlowFileHandler that is returned by pyInfinityFlow. InfinityFlow_Utilities.check_infinity_flow_annotation_dataframes. (Required)

  • output_paths (dict) – The output_paths dictionary created by the pyInfinityFlow. InfinityFlow_Utilities.setup_output_directories function (Required)

  • background_corrected_data (pandas.DataFrame or None) – The background corrected data generated by pyInfinityFlow. InfinityFlow_Utilities.perform_background_correction. (Default=None)

  • background_corrected_var (pandas.DataFrame or None) – The background_corrected_var DataFrame generated by pyInfinityFlow. InfinityFlow_Utilities.perform_background_correction. (Default=None)

  • add_umap (bool) – If True, will add the ‘umap-x’ and ‘umap-y’ features from sub_p_adata.obs to sub_p_adata.X. Requires that the 2D-UMAP has been generated for sub_p_adata and is specified in the ‘umap-x’ and ‘umap-y’ features of sub_p_adata.obs (Default=False)

  • use_logicle (bool) – If True, the function will attempt to inver the logicle normalization before the data is saved.

  • verbosity (int (0|1|2|3)) – Specifies to what verbosity level the function will output progress and debugging statements. (Default=0)

Returns

A dictionary that saves how much time each step of function takes.

Return type

dict


Transformations

pyInfinityFlow.Transformations.apply_logicle(x, T=3000000, W=0, M=3, A=1)

The logicle scale is the inverse of a modified biexponential function and has the same relation to the modified biexponential function that a logarithmic scale has to its corresponding exponential function. [1]

The logicle uses the modified biexponential function B, according to:

logicle(x, T, W, M, A) = root(B(y, T, W, M, A) - x)

B is the modified biexponential function:

B(y, T, W, M, A) = (ae^(by) - cd^(-dy)) - f

where:

w = W / (M + A)

x2 = A / (M + A)

x1 = x2 + w

x0 = x2 + 2w

b = (M + A) * ln(10)

d is a constant so that:

2(ln(d) - ln(b)) + w(d+b) = 0

given b and w:

ca = e^(x0(b+d))

fa = (d^(b * x1)) - (ca / e^(d * x1))

a = T / ((e^b) - f - (c / (e^d)))

c = c * a

f = f * a
Parameters
  • x (list-like numeric vector) – The input vector to normalize with logicle transformation

  • T (numeric) – The formal “Top of scale” value (Default=3000000)

  • W (numeric) – (Width parameter) The number of decades in the approximately linear region The choice of W = 0 gives essentially the hyperbolic sine function (sinh x)

  • M (numeric) – The number of decades that the true logarithmic scale approached at the high end of the logicle scale would cover in the plot range

  • A (numeric) – Number of Additional decades of negative data values to be included

Note

Parameters should be chosen so that:
  • T > 0

  • M > 0

  • 0 <= W <= M/2

Returns

The input x after applying the logicle function

Return type

list-like numeric vector

References

1

Moore, Wayne A., and David R. Parks. “Update for the logicle data scale including operational code implementations,” Cytometry. Part A: the journal of the International Society for Analytical Cytology 81.4 (2012): 273.

pyInfinityFlow.Transformations.apply_inverse_logicle(x, T=3000000, W=0, M=3, A=1)

This function inverts pyInfinityFlow.Transformations.apply_logicle

Parameters
  • x (list-like numeric vector) – The input vector to invert the logicle transformation

  • T (numeric) – The formal “Top of scale” value (Default=3000000)

  • W (numeric) – (Width parameter) The number of decades in the approximately linear region The choice of W = 0 gives essentially the hyperbolic sine function (sinh x)

  • M (numeric) – The number of decades that the true logarithmic scale approached at the high end of the logicle scale would cover in the plot range

  • A (numeric) – Number of Additional decades of negative data values to be included

Returns

The input x after applying the inverse logicle function

Return type

list-like numeric vector

pyInfinityFlow.Transformations.scale_feature(input_array, min_threshold_percentile, max_threshold_percentile)

Removes outliers and applies MinMaxScaler

This function is designed to remove outliers and fit the distribution into the range (0,1)

Parameters
  • input_array (list-like numeric vector) – The feature values to scale

  • min_threshold_percentile ((number between 0 to 100 inclusive)) – The minimum value for the input domain to be accepted, outliers below the percentile value given by this parameter will take on that minimum value

  • max_threshold_percentile ((number between 0 to 100 inclusive)) – The maximum value for the input domain to be accepted, outliers above the percentile value given by this parameter will take on the maximum value

Returns

The input_array after applying the thresholding and min-max scaling

Return type

list-like numeric vector


fcs_io

FCSFileObject Class

class pyInfinityFlow.fcs_io.FCSFileObject(fcs_file_path='', mode='r', read_data_segment=True)

Primary class for working with FCS files.

This class is used to read and write FCS files. A mode is specified to either read from or write to the given fcs_file_path. Reading of FCS files can be done without including the DATA segment, so that the HEADER and TEXT segments can be read quickly.

Warning

Currently only FCS3.1 files are supported.

Parameters
  • fcs_file_path (str) – The path to the FCS file. (Required)

  • mode (str (Epects 'r'|'w')) – The mode in which to treat the FCS file. If ‘r’, the class instance will read from the FCS file immediately after it is created. (Default=’r’)

  • read_data_segment (bool) – Whether or not to read in the DATA segment of the FCS file. If false, this allows you to read in the HEADER and TEXT segment values into the class to learn important properties from the FCS file (Eg. The number of events captured, the channel names, etc.) (Default=True)

file_path

The path to the FCS file. Set by fcs_file_path

Type

str

byte_locations
The binary positions of the files marking the different segments. This will be filled when mode=’r’ upon instantiation with the following keys:
  • [“text_start”]

  • [“text_end”]

  • [“data_start”]

  • [“data_end”]

  • [“analysis_start”]

  • [“analysis_end”]

Type

dict{KEY: int}

version

The version of the .fcs file (Eg. ‘FCS3.1’)

Type

str

text_segment

The TEXT segment as a string.

Type

str

delimiter

This is the character used as the delimiter between items in the TEXT segment

Type

str

byteord_format

The byte order format to use to read the DATA segment

Type

str

text_segment_values

A dictionary that stores the FCS file TEXT segment key-value entries. These are important for defining properties about the channels, file positions, experiment annotations, etc.

Type

dict{KEY: str}

spillover

The spillover matrix to use for compensation

Type

pandas.DataFrame

data

The data from the DATA segment of the FCS file

Type

pandas.DataFrame

struct_format_string

Struct format string to pack and unpack the DATA segment as binary

Type

str

par_count

The number of parameters in the FCSFileObject

Type

int

list_par_n

Ordered list of parameters by number

Type

list[int]

named_par

The $PnS names, usually defined by the user when the FCS data is captured

Type

list[str]

named_par_channel

The $PnN channel names, these must be unique and are generally defined by the software used to capture the FCS events (Eg. “GFP-A”)

Type

list[str]

load_data_from_pd_df(input_pd_df, input_channel_names=[], input_spillover_matrix=None, additional_text_segment_values={})

Load data from a pandas.DataFrame into an FCSFileObject

This method allows the FCSFileObject to add data to the DATA segment from a pandas.DataFrame. The resulting data can then be written to an FCS file using the FCSFileObject.to_fcs() method

Parameters
  • input_pd_df (pandas.DataFrame) – The DATA to add to the DATA segment. (Required)

  • input_channel_names (list[str]) – These are the names to give to the $PnS TEXT segment key-value. The list must be the same size as input_pd_df.shape[1] (The columns of the input_pd_df) or be empty (Default=[])

  • input_spillover_matrix (None or pandas.DataFrame) – The spillover matrix for fluorescence compensation (Default=None)

  • additional_text_segment_values (dict{'str':'str'}) – Key-value pairs to add to the TEXT segment of the FCS file. (Default={})

Returns

Adds the given information to the FCSFileObject

Return type

None

read_fcs()

Read data from an FCS file into the FCSFileObject instance

This is useful if a shallow read of the FCS file was initially performed without reading in the DATA segment. This function will read in the data from the FCS file.

Returns

Adds the data from the DATA segment of the fcs_file to the FCSFileObject

Return type

None

to_fcs(fcs_file_path, add_spillover_matrix=False)

Write FCSFileObject to an FCS file

Parameters
  • fcs_file_path (str) – The path to where the FCS file should be written. (Required)

  • add_spillover_matrix (bool) – Specifies whether or not to add the Spillover matrix defined in the FCSFileObject.spillover attribute to the TEXT segment of the FCS file. (Default=False)

Returns

Attempts to write the FCS file.

Return type

None

fcs_io: Functions

pyInfinityFlow.fcs_io.list_fcs_channels(fcs_file_path, add_user_defined_names=False)

List the channel names defined in a given FCS file

This is useful for getting the channel names, exactly as they are written, in the FCS file. It is important when defining the Reference and Query channels in the backbone_annotation file, or the Target channel name in the InfinityMarker annotation file that they match the names from the channels in the specified FCS files.

Note

  • “PnS” is the key used in an FCS file to specify how the user wanted to annotate the channel.

  • “PnN” is the key used in an FCS file to specify the name for the channel and each must be unique in a given FCS file

Parameters
  • fcs_file_path (str) – Path to the .fcs file (Required)

  • add_user_defined_names (bool) – If True, the function will put the user-defined (“PnS”) names as a second column with the unique channel (“PnN”) names as the first column. (Default=False)

Returns

Prints out the channel names as a table to stdout.

Return type

None


Plotting_Utilities

pyInfinityFlow.Plotting_Utilities.assign_rainbow_colors_to_groups(groups)

Creates a dictionary of cluster names to hexadecimal color strings

This function takes a list of groups and assigns each unique item in the groups a color (using the matplotlib.cm.rainbow color-map) as a hexadecimal string value. This is useful for storing a single color scheme for clusters to be used with downstream visualizations.

Parameters

groups (numpy.Array[str]) – List of cluster names. (Required)

Returns

dict {str – Dictionary of cluster-names to assigned colors (hexadecimal value)

Return type

str}

pyInfinityFlow.Plotting_Utilities.plot_feature_over_x_y_coordinates_and_save_fig(feature_vector, x, y, feature_name, file_path)

Plots a 2D-scatter plot of numeric vector over x and y coordinates

This function takes a feature_vector, x and y coordinates, a feature_name, and a file_path and plots a scatterplot of all points, coloring the points using the “jet” colormap in matplotlib following the feature_vector scale.

Warning

It is expected that feature_vector, x, and y correspond to the same events, in the same order.

Note

The colormap will start at the 20th percentile (~blue) and end at the 80th percentile (~red) of the feature vector.

Parameters
  • feature_vector (numpy.Array[numeric]) – Numeric values to map the ‘jet’ colormap onto in the scatter plot. (Required)

  • x (numpy.Array[numeric]) – Numeric values for the x-coordinate of the scatter plot (Required)

  • y (numpy.Array[numeric]) – Numeric values for the y-coordinate of the scatter plot (Required)

  • feature_name (str) – Label to give the colorbar and plt.title of the scatter plot. (Required)

  • file_path (str) – The path to save the figure. (Required)

Returns

Saves the scatterplot to the file specified by file_path

Return type

None

pyInfinityFlow.Plotting_Utilities.plot_markers_df(input_df, ordered_markers_df, ordered_cells_df, groups_to_colors, path_to_save_figure)

Plots a heatmap of the MarkerFinder results

This function takes a pandas.DataFrame of values, a markers_df and cell_assignments from pyInfinityFlow.InfinityFlow_Utilities. find_markers_from_anndata to plot a heatmap of the markers.

Note

This function expects pyInfinityFlow.InfinityFlow_Utilities. find_markers_from_anndata to have already been run.

Parameters
  • input_df (pandas.DataFrame) – Data to plot. The columns must intersect with features in the ordered_markers_df and the rows must intersect with the cells in (Required)

  • ordered_markers_df (pandas.DataFrame) – The markers_df output from pyInfinityFlow.InfinityFlow_Utilities. find_markers_from_anndata (Required)

  • ordered_cells_df (pandas.DataFrame) – The cell_assignments output from pyInfinityFlow.InfinityFlow_Utilities. find_markers_from_anndata (Required)

  • groups_to_colors (dict {str:str}) – Dictionary of cluster-names to assigned colors (hexadecimal value) The pyInfinityFlow.Plotting_Utilities.assign_rainbow_colors_to_groups can be used to generate this dictionary from a list of clusters. (Required)

  • path_to_save_figure (str) – The path to save the figure. (Required)

Returns

Saves the heatmap to the file specified by path_to_save_figure

Return type

None

pyInfinityFlow.Plotting_Utilities.plot_leiden_clusters_over_umap(sub_p_adata, output_paths, verbosity)

Plots a 2D-UMAP colored by the values in the “leiden” field

This function takes a pandas.DataFrame of values, a markers_df and cell_assignments from pyInfinityFlow.InfinityFlow_Utilities. find_markers_from_anndata to plot a heatmap of the markers.

Note

It is expected that scanpy.pp.neighbors, scanpy.tl.umap, and scanpy.tl. leiden have been run on sub_p_adata. Also the x,y coordinates of the UMAP must have been added to the sub_p_adata.obs pandas.DataFrame.

Parameters
  • sub_p_adata (anndata.AnnData) –

    pyInfinityFlow formatted AnnData object. It is expected that the object have the following attributes present:

    • sub_p_adata.obs[‘umap-x’] : the x-coordinates of the UMAP plot are required to be in the sub_p_adata.obs pandas.DataFrame

    • sub_p_adata.obs[‘umap-y’] : the y-coordinates of the UMAP plot are required to be in the sub_p_adata.obs pandas.DataFrame

    • sub_p_adata.obs[‘leiden’] : leiden cluster assignments are required to be in the sub_p_adata.obs pandas.DataFrame

    • sub_p_adata.uns[‘groups_to_color’] : (dict{str:str}) Dictionary of cluster-names to assigned colors (hexadecimal value) (Required)

  • output_paths (dict) – The output_paths dictionary created by the pyInfinityFlow. InfinityFlow_Utilities.setup_output_directories function (Required)

  • verbosity (int (0|1|2|3)) – Specifies to what verbosity level the function will output progress and debugging statements. (Default=0)

Returns

Saves the scatterplot to the file specified by output_paths[“clustering”]

Return type

None