Data Transformation

The Data Transformation module is a key component of the NOCTIS, focusing on various aspects of data manipulation, formatting, and processing. This module provides tools for styling data, formatting Neo4j results, and both preprocessing and postprocessing of chemical data.

Data Styles

Data Frame Stylers

NodesRelationshipsStyle()

A class to manage the style and export of nodes and relationships data to pandas DataFrames.

PandasExportStyle()

A subclass of NodesRelationshipsStyle with default settings for exporting nodes and relationships to DataFrames.

Neo4j

Neo4j Formatter

Neo4jResultFormatter()

A class to format Neo4j query results into nodes and relationships.

format_result(result, ce_label)

Format a Neo4j result into a DataContainer of GraphRecords.

Postprocessing

Chem Data Generators

ChemDataGeneratorInterface()

Abstract base class for chemical data generators.

ChemDataGeneratorFactory()

Factory class for registering and retrieving chemical data generators.

PandasGenerator()

Generator class for exporting chemical data to pandas DataFrames.

NetworkXGenerator()

Generator class for exporting chemical data to NetworkX graphs.

ReactionStringGenerator()

Generator class for exporting chemical data to reaction strings.

SyngraphGenerator()

Generator class for exporting chemical data to synthetic graphs.

Preprocessing

Core Graph Builder

FormatValidator()

Class to validate and map reaction formats to corresponding molecular string computation functions.

CoreGraphBuilder()

ValidatedStringBuilder(input_format, ...)

Processor to handle reaction strings with cheminformatics validation.

UnvalidatedStringBuilder(input_format)

Processor to handle reaction strings without cheminformatics validation.

Data Preprocessing

PreprocessorConfig([inp_chem_format, ...])

Configuration for preprocessing chemical data.

PandasRowPreprocessorBase(schema, config)

Abstract base class for preprocessing rows of a pandas DataFrame according to a specified graph schema and configuration.

CSVPreprocessor(schema, config)

Preprocessor for handling CSV files, capable of processing data either in parallel or serially using Dask or pandas.

PythonObjectPreprocessorInterface()

Abstract base class for preprocessors that handle Python objects.

ChemicalStringPreprocessorBase(schema, config)

Base class for preprocessors that handle chemical strings.

PythonObjectPreprocessorFactory()

Factory class for creating preprocessors based on data type.

DataFramePreprocessor(schema, config)

Preprocessor for handling pandas DataFrames, extracting nodes and relationships based on a predefined schema and configuration.

ReactionStringsPreprocessor(schema, config)

Preprocessor for handling lists of chemical reaction strings, extracting nodes and relationships based on a predefined schema and configuration.

SynGraphPreprocessor(schema, config)

Preprocessor for handling synthetic graph objects, extracting nodes and relationships based on a predefined schema and configuration.

Preprocessor([schema])

A class to handle preprocessing tasks for various data formats, including CSV files and Python objects, for Neo4j integration.

GraphExpander

GraphExpander(schema)

Class to expand graph data based on a given schema, including nodes and relationships.

Utilities

_update_partition_dict_with_row(target_dict, ...)

Update the target dictionary with values from the source dictionary.

_save_dataframes_to_partition_csv(...)

Save node and relationship DataFrames to partitioned CSV files.

_save_list_to_partition_csv(my_list, header, ...)

Save a list to a CSV file with a header in a partition-specific directory.

_merge_partition_files(filename, tmp_dir, ...)

Merge partition files into a single CSV file.

create_noctis_relationship(mol_node, ...)

Create a noctis relationship based on its type.

_delete_tmp_folder(tmp_folder)

Delete a temporary folder and log the outcome.

create_noctis_node(node_uid, node_label, ...)

Create a noctis node.

explode_smiles_like_reaction_string(...)

Explode a SMILES-like reaction string into reactants and products.

explode_v3000_reaction_string(reaction_string)

Placeholder function for exploding V3000 reaction strings.

dict_to_list(d)

Convert a dictionary of lists into a single list.

create_data_container(nodes, relationships, ...)

Create a DataContainer object from nodes and relationships.