noctis.data_transformation.preprocessing.data_preprocessing.CSVPreprocessor¶
- class noctis.data_transformation.preprocessing.data_preprocessing.CSVPreprocessor(schema: GraphSchema, config: PreprocessorConfig)[source]¶
Preprocessor for handling CSV files, capable of processing data either in parallel or serially using Dask or pandas.
- Attributes:
input_file (str): Path to the input CSV file. max_partitions (int): Maximum number of partitions for processing.
- __init__(schema: GraphSchema, config: PreprocessorConfig)¶
Initialize the preprocessor with a graph schema and configuration.
- Args:
schema (GraphSchema): The schema defining the nodes and relationships. config (PreprocessorConfig): Configuration settings for preprocessing.
Methods
__init__(schema, config)Initialize the preprocessor with a graph schema and configuration.
run(input_file, parallel[, dask_client])Execute the preprocessing of a CSV file.
Attributes
input_filemax_partitions