noctis.data_transformation.preprocessing.data_preprocessing.Preprocessor¶
- class noctis.data_transformation.preprocessing.data_preprocessing.Preprocessor(schema: GraphSchema | None = GraphSchema(base_nodes={'chemical_equation': 'ChemicalEquation', 'molecule': 'Molecule'}, base_relationships={'product': {'type': 'PRODUCT', 'start_node': 'chemical_equation', 'end_node': 'molecule'}, 'reactant': {'type': 'REACTANT', 'start_node': 'molecule', 'end_node': 'chemical_equation'}}, extra_nodes={}, extra_relationships={}))[source]¶
A class to handle preprocessing tasks for various data formats, including CSV files and Python objects, for Neo4j integration.
- Attributes:
preprocessor (Optional[PythonObjectPreprocessorInterface]): The preprocessor instance. schema (Optional[GraphSchema]): The graph schema for processing. config (PreprocessorConfig): Configuration settings for preprocessing.
- __init__(schema: GraphSchema | None = GraphSchema(base_nodes={'chemical_equation': 'ChemicalEquation', 'molecule': 'Molecule'}, base_relationships={'product': {'type': 'PRODUCT', 'start_node': 'chemical_equation', 'end_node': 'molecule'}, 'reactant': {'type': 'REACTANT', 'start_node': 'molecule', 'end_node': 'chemical_equation'}}, extra_nodes={}, extra_relationships={}))[source]¶
Initialize the Preprocessor with a graph schema and default configuration.
- Args:
schema (Optional[GraphSchema]): The graph schema for processing.
Methods
__init__([schema])Initialize the Preprocessor with a graph schema and default configuration.
get_failed_strings()Retrieve strings that failed during preprocessing.
info()Display information about the Preprocessor capabilities, including the types of objects it can transform and the reaction string formats it supports.
preprocess_csv_for_neo4j_parallel(input_file)Preprocess a CSV file for Neo4j integration using parallel processing.
preprocess_csv_for_neo4j_serial(input_file)Preprocess a CSV file for Neo4j integration using serial processing.
preprocess_object_for_neo4j(data, data_type)Preprocess Python objects for Neo4j integration.
set_config_from_yaml([file_path])Set the preprocessor configuration from a YAML file.