Gratools

class gratools.Gratools.Gratools(gfa_path: ~pathlib.Path, threads: int = 1, logger: ~logging.Logger = <factory>, gfa_name: str | None = None, bam_segments_file: ~pathlib.Path | None = None, dict_samples_chrom: ~collections.defaultdict = <factory>, works_path: ~pathlib.Path | None = None, bed_path: ~pathlib.Path | None = None, bam_path: ~pathlib.Path | None = None, samples_chrom_path: ~pathlib.Path | None = None, dict_gfa_graph_object: dict = <factory>, sample_name: str | None = None, chromosome: str | None = None, start: int = 0, stop: int | None = None, suffix: str | None = None, build_fasta: bool = False, gzip_gfa: bool = False)[source]

Bases: object

A class to parse and process GFA files, including handling BAM files and extracting segments and links.

gfa_path

The path to the GFA file.

Type:

Path

gfa_name

The name of the GFA file.

Type:

str

bam_segments_file

The path to the BAM segments file.

Type:

Path

dict_samples_chrom

A dictionary mapping sample names to chromosomes.

Type:

defaultdict

works_path

The path to the working directory.

Type:

Path

bed_path

The path to the BED files directory.

Type:

Path

bam_path

The path to the BAM files directory.

Type:

Path

samples_chrom_path

The path to the file containing sample chromosome data.

Type:

Path

Attributes Summary

bam_path

bam_segments_file

bed_path

build_fasta

chromosome

get_chromosome_data

Load the chromosome data from a CSV file and return a DataFrame.

get_chromosomes_by_sample

Group the chromosomes by SAMPLES and return a grouped DataFrame.

get_sample_names

Retrieve the unique sample names from the dataset.

gfa_name

gzip_gfa

sample_name

samples_chrom_path

start

stop

suffix

threads

works_path

Methods Summary

concat_generate_sub_graph(suffix)

display_chromosome_names()

Display the SAMPLES and CHROMOSOMES_LIST using rich.

display_full_chromosome_data()

Display the full chromosome data (SAMPLES, CHROMOSOMES_LIST, START, END) using rich.

display_sample_names()

Display a list of sample names in a formatted table using rich.

extract_sub_graph(sample_name, chromosome[, ...])

Extract a subgraph from the GFA file.

generate_fasta(suffix)

get_chromosome_size(sample, chrom)

parallelize_samples(samples_list)

process_sample(sample, shared_progress, task_id)

Traite un échantillon en plusieurs étapes avec mise à jour de la progression via un dictionnaire partagé.

segments_info([shared_min, specific_max, ...])

specific_and_shared_segments([...])

Attributes Documentation

bam_path: Path = None
bam_segments_file: Path = None
bed_path: Path = None
build_fasta: bool = False
chromosome: str = None
get_chromosome_data

Load the chromosome data from a CSV file and return a DataFrame.

Returns:

DataFrame containing the chromosome data.

Return type:

pandas.DataFrame

get_chromosomes_by_sample

Group the chromosomes by SAMPLES and return a grouped DataFrame.

Returns:

A DataFrame with chromosomes grouped by SAMPLES.

Return type:

pandas.DataFrame

get_sample_names

Retrieve the unique sample names from the dataset.

This function extracts the unique sample names from the dict_samples_chrom attribute, which is assumed to be a dictionary where keys represent the sample names, and values contain chromosome data.

Returns:

A list of unique sample names extracted from the dataset.

Return type:

list

Example

>>> sample_names = self.get_sample_names()
>>> print(sample_names)
['CG14', 'Og103', 'Og182', 'Og20', 'Tog5681']
gfa_name: str = None
gzip_gfa: bool = False
sample_name: str = None
samples_chrom_path: Path = None
start: int = 0
stop: int = None
suffix: str = None
threads: int = 1
works_path: Path = None

Methods Documentation

concat_generate_sub_graph(suffix)[source]
display_chromosome_names()[source]

Display the SAMPLES and CHROMOSOMES_LIST using rich.

display_full_chromosome_data()[source]

Display the full chromosome data (SAMPLES, CHROMOSOMES_LIST, START, END) using rich.

display_sample_names()[source]

Display a list of sample names in a formatted table using rich.

This function show a list of sample names visually appealing table format using the rich library. The table will contain only one column labeled ‘SAMPLES’ where each row represents a sample name.

Returns:

This function only prints the table to the console; it does not return any value.

Return type:

None

extract_sub_graph(sample_name, chromosome, start=0, stop=None, samples_list_path=None, build_fasta=False)[source]

Extract a subgraph from the GFA file.

Parameters:
  • sample_name (str) – The name of the sample.

  • chromosome (str) – The chromosome identifier.

  • start (int) – The starting position. default=0

  • stop (int) – The stopping position.

  • samples_list_path (Path, optional) – File with list of samples to include in the subgraph.

generate_fasta(suffix)[source]
get_chromosome_size(sample, chrom)[source]
parallelize_samples(samples_list)[source]
process_sample(sample, shared_progress, task_id)[source]

Traite un échantillon en plusieurs étapes avec mise à jour de la progression via un dictionnaire partagé.

segments_info(shared_min=None, specific_max=None, filter_len=None)[source]
specific_and_shared_segments(sample_list_a_path=None, sample_list_b_path=None, filter_len=None)[source]