networkcommons.utils.handle_missing_values

networkcommons.utils.handle_missing_values(df, threshold=0.1, fill=<function mean>)

Handles missing values in a DataFrame by filling them with a specified function or value, or dropping the rows.

Parameters: - df (pandas.DataFrame): The DataFrame containing the data. - threshold (float): The threshold for the share (0<n<1) of missing values in a row. Rows with a share

of missing values greater than or equal to the threshold will be dropped.

  • fill (callable, int, float, or None): If callable, the function is applied to each row to fill missing values.

    If an integer or float, it is used to fill missing values. If None, no filling is done.

Returns: - df (pandas.DataFrame): The DataFrame with missing values handled.

Raises: - ValueError: If more than one non-numeric column is found in the DataFrame.

Example: >>> df = pd.DataFrame({‘A’: [1, 2, np.nan], ‘B’: [3, 2, np.nan], ‘C’: [np.nan, 7, 8]}) >>> handle_missing_values(df, 0.5, fill=np.mean) Number of genes filled: 1 Number of genes removed: 1