noctis.data_transformation.preprocessing.data_preprocessing.CSVPreprocessor

class noctis.data_transformation.preprocessing.data_preprocessing.CSVPreprocessor(schema: GraphSchema, config: PreprocessorConfig)[source]

Preprocessor for handling CSV files, capable of processing data either in parallel or serially using Dask or pandas.

Attributes:

input_file (str): Path to the input CSV file. max_partitions (int): Maximum number of partitions for processing.

__init__(schema: GraphSchema, config: PreprocessorConfig)

Initialize the preprocessor with a graph schema and configuration.

Args:

schema (GraphSchema): The schema defining the nodes and relationships. config (PreprocessorConfig): Configuration settings for preprocessing.

Methods

__init__(schema, config)

Initialize the preprocessor with a graph schema and configuration.

run(input_file, parallel[, dask_client])

Execute the preprocessing of a CSV file.

Attributes

input_file

max_partitions