noctis.data_transformation.preprocessing.data_preprocessing.Preprocessor

class noctis.data_transformation.preprocessing.data_preprocessing.Preprocessor(schema: GraphSchema | None = GraphSchema(base_nodes={'chemical_equation': 'ChemicalEquation', 'molecule': 'Molecule'}, base_relationships={'product': {'type': 'PRODUCT', 'start_node': 'chemical_equation', 'end_node': 'molecule'}, 'reactant': {'type': 'REACTANT', 'start_node': 'molecule', 'end_node': 'chemical_equation'}}, extra_nodes={}, extra_relationships={}))[source]

A class to handle preprocessing tasks for various data formats, including CSV files and Python objects, for Neo4j integration.

Attributes:

preprocessor (Optional[PythonObjectPreprocessorInterface]): The preprocessor instance. schema (Optional[GraphSchema]): The graph schema for processing. config (PreprocessorConfig): Configuration settings for preprocessing.

__init__(schema: GraphSchema | None = GraphSchema(base_nodes={'chemical_equation': 'ChemicalEquation', 'molecule': 'Molecule'}, base_relationships={'product': {'type': 'PRODUCT', 'start_node': 'chemical_equation', 'end_node': 'molecule'}, 'reactant': {'type': 'REACTANT', 'start_node': 'molecule', 'end_node': 'chemical_equation'}}, extra_nodes={}, extra_relationships={}))[source]

Initialize the Preprocessor with a graph schema and default configuration.

Args:

schema (Optional[GraphSchema]): The graph schema for processing.

Methods

__init__([schema])

Initialize the Preprocessor with a graph schema and default configuration.

get_failed_strings()

Retrieve strings that failed during preprocessing.

info()

Display information about the Preprocessor capabilities, including the types of objects it can transform and the reaction string formats it supports.

preprocess_csv_for_neo4j_parallel(input_file)

Preprocess a CSV file for Neo4j integration using parallel processing.

preprocess_csv_for_neo4j_serial(input_file)

Preprocess a CSV file for Neo4j integration using serial processing.

preprocess_object_for_neo4j(data, data_type)

Preprocess Python objects for Neo4j integration.

set_config_from_yaml([file_path])

Set the preprocessor configuration from a YAML file.