Gratools

class gratools.Gratools.Gratools(gfa_path: ~pathlib.Path, threads: int = 1, logger: ~logging.Logger = <factory>, gfa_name: str | None = None, bam_segments_file: ~pathlib.Path | None = None, dict_samples_chrom: ~collections.defaultdict = <factory>, works_path: ~pathlib.Path | None = None, bed_path: ~pathlib.Path | None = None, bam_path: ~pathlib.Path | None = None, samples_chrom_path: ~pathlib.Path | None = None, dict_gfa_graph_object: dict = <factory>, sample_name: str | None = None, chromosome: str | None = None, start: int = 0, stop: int | None = None, suffix: str | None = None, build_fasta: bool = False, gzip_gfa: bool = False)[source]

Bases: object

A class to parse and process GFA files, including handling BAM files and extracting segments and links.

gfa_path

The path to the GFA file.

Type:: Path

gfa_name

The name of the GFA file.

Type:: str

bam_segments_file

The path to the BAM segments file.

Type:: Path

dict_samples_chrom

A dictionary mapping sample names to chromosomes.

Type:: defaultdict

works_path

The path to the working directory.

Type:: Path

bed_path

The path to the BED files directory.

Type:: Path

bam_path

The path to the BAM files directory.

Type:: Path

samples_chrom_path

The path to the file containing sample chromosome data.

Type:: Path

Attributes Summary

`bam_path`
`bam_segments_file`
`bed_path`
`build_fasta`
`chromosome`
`get_chromosome_data`	Load the chromosome data from a CSV file and return a DataFrame.
`get_chromosomes_by_sample`	Group the chromosomes by SAMPLES and return a grouped DataFrame.
`get_sample_names`	Retrieve the unique sample names from the dataset.
`gfa_name`
`gzip_gfa`
`sample_name`
`samples_chrom_path`
`start`
`stop`
`suffix`
`threads`
`works_path`

Methods Summary

`concat_generate_sub_graph`(suffix)
`display_chromosome_names`()	Display the SAMPLES and CHROMOSOMES_LIST using rich.
`display_full_chromosome_data`()	Display the full chromosome data (SAMPLES, CHROMOSOMES_LIST, START, END) using rich.
`display_sample_names`()	Display a list of sample names in a formatted table using rich.
`extract_sub_graph`(sample_name, chromosome[, ...])	Extract a subgraph from the GFA file.
`generate_fasta`(suffix)
`get_chromosome_size`(sample, chrom)
`parallelize_samples`(samples_list)
`process_sample`(sample, shared_progress, task_id)	Traite un échantillon en plusieurs étapes avec mise à jour de la progression via un dictionnaire partagé.
`segments_info`([shared_min, specific_max, ...])
`specific_and_shared_segments`([...])

Attributes Documentation

bam_path: Path = None

bam_segments_file: Path = None

bed_path: Path = None

build_fasta: bool = False

chromosome: str = None

get_chromosome_data

Load the chromosome data from a CSV file and return a DataFrame.

Returns:: DataFrame containing the chromosome data.
Return type:: pandas.DataFrame

get_chromosomes_by_sample

Group the chromosomes by SAMPLES and return a grouped DataFrame.

Returns:: A DataFrame with chromosomes grouped by SAMPLES.
Return type:: pandas.DataFrame

get_sample_names

Retrieve the unique sample names from the dataset.

This function extracts the unique sample names from the dict_samples_chrom attribute, which is assumed to be a dictionary where keys represent the sample names, and values contain chromosome data.

Returns:: A list of unique sample names extracted from the dataset.
Return type:: list

Example

>>> sample_names = self.get_sample_names()
>>> print(sample_names)
['CG14', 'Og103', 'Og182', 'Og20', 'Tog5681']

gfa_name: str = None

gzip_gfa: bool = False

sample_name: str = None

samples_chrom_path: Path = None

start: int = 0

stop: int = None

suffix: str = None

threads: int = 1

works_path: Path = None

Methods Documentation

concat_generate_sub_graph(suffix)[source]

display_chromosome_names()[source]: Display the SAMPLES and CHROMOSOMES_LIST using rich.

display_full_chromosome_data()[source]: Display the full chromosome data (SAMPLES, CHROMOSOMES_LIST, START, END) using rich.

display_sample_names()[source]

Display a list of sample names in a formatted table using rich.

This function show a list of sample names visually appealing table format using the rich library. The table will contain only one column labeled ‘SAMPLES’ where each row represents a sample name.

Returns:: This function only prints the table to the console; it does not return any value.
Return type:: None

extract_sub_graph(sample_name, chromosome, start=0, stop=None, samples_list_path=None, build_fasta=False)[source]

Extract a subgraph from the GFA file.

Parameters:

sample_name (str) – The name of the sample.
chromosome (str) – The chromosome identifier.
start (int) – The starting position. default=0
stop (int) – The stopping position.
samples_list_path (Path, optional) – File with list of samples to include in the subgraph.

generate_fasta(suffix)[source]

get_chromosome_size(sample, chrom)[source]

parallelize_samples(samples_list)[source]

process_sample(sample, shared_progress, task_id)[source]: Traite un échantillon en plusieurs étapes avec mise à jour de la progression via un dictionnaire partagé.

segments_info(shared_min=None, specific_max=None, filter_len=None)[source]

specific_and_shared_segments(sample_list_a_path=None, sample_list_b_path=None, filter_len=None)[source]