Gratools
- class gratools.Gratools.Gratools(gfa_path: ~pathlib.Path, threads: int = 1, logger: ~logging.Logger = <factory>, gfa_name: str | None = None, bam_segments_file: ~pathlib.Path | None = None, dict_samples_chrom: ~collections.defaultdict = <factory>, works_path: ~pathlib.Path | None = None, bed_path: ~pathlib.Path | None = None, bam_path: ~pathlib.Path | None = None, samples_chrom_path: ~pathlib.Path | None = None, dict_gfa_graph_object: dict = <factory>, sample_name: str | None = None, chromosome: str | None = None, start: int = 0, stop: int | None = None, suffix: str | None = None, build_fasta: bool = False, gzip_gfa: bool = False)[source]
Bases:
object
A class to parse and process GFA files, including handling BAM files and extracting segments and links.
- gfa_path
The path to the GFA file.
- Type:
Path
- gfa_name
The name of the GFA file.
- Type:
str
- bam_segments_file
The path to the BAM segments file.
- Type:
Path
- dict_samples_chrom
A dictionary mapping sample names to chromosomes.
- Type:
defaultdict
- works_path
The path to the working directory.
- Type:
Path
- bed_path
The path to the BED files directory.
- Type:
Path
- bam_path
The path to the BAM files directory.
- Type:
Path
- samples_chrom_path
The path to the file containing sample chromosome data.
- Type:
Path
Attributes Summary
Load the chromosome data from a CSV file and return a DataFrame.
Group the chromosomes by SAMPLES and return a grouped DataFrame.
Retrieve the unique sample names from the dataset.
Methods Summary
Display the SAMPLES and CHROMOSOMES_LIST using rich.
Display the full chromosome data (SAMPLES, CHROMOSOMES_LIST, START, END) using rich.
Display a list of sample names in a formatted table using rich.
extract_sub_graph
(sample_name, chromosome[, ...])Extract a subgraph from the GFA file.
get_chromosome_size
(sample, chrom)parallelize_samples
(samples_list)process_sample
(sample, shared_progress, task_id)Traite un échantillon en plusieurs étapes avec mise à jour de la progression via un dictionnaire partagé.
segments_info
([shared_min, specific_max, ...])specific_and_shared_segments
([...])Attributes Documentation
- bam_path: Path = None
- bam_segments_file: Path = None
- bed_path: Path = None
- build_fasta: bool = False
- chromosome: str = None
- get_chromosome_data
Load the chromosome data from a CSV file and return a DataFrame.
- Returns:
DataFrame containing the chromosome data.
- Return type:
pandas.DataFrame
- get_chromosomes_by_sample
Group the chromosomes by SAMPLES and return a grouped DataFrame.
- Returns:
A DataFrame with chromosomes grouped by SAMPLES.
- Return type:
pandas.DataFrame
- get_sample_names
Retrieve the unique sample names from the dataset.
This function extracts the unique sample names from the dict_samples_chrom attribute, which is assumed to be a dictionary where keys represent the sample names, and values contain chromosome data.
- Returns:
A list of unique sample names extracted from the dataset.
- Return type:
list
Example
>>> sample_names = self.get_sample_names() >>> print(sample_names) ['CG14', 'Og103', 'Og182', 'Og20', 'Tog5681']
- gfa_name: str = None
- gzip_gfa: bool = False
- sample_name: str = None
- samples_chrom_path: Path = None
- start: int = 0
- stop: int = None
- suffix: str = None
- threads: int = 1
- works_path: Path = None
Methods Documentation
- display_full_chromosome_data()[source]
Display the full chromosome data (SAMPLES, CHROMOSOMES_LIST, START, END) using rich.
- display_sample_names()[source]
Display a list of sample names in a formatted table using rich.
This function show a list of sample names visually appealing table format using the rich library. The table will contain only one column labeled ‘SAMPLES’ where each row represents a sample name.
- Returns:
This function only prints the table to the console; it does not return any value.
- Return type:
None
- extract_sub_graph(sample_name, chromosome, start=0, stop=None, samples_list_path=None, build_fasta=False)[source]
Extract a subgraph from the GFA file.
- Parameters:
sample_name (str) – The name of the sample.
chromosome (str) – The chromosome identifier.
start (int) – The starting position. default=0
stop (int) – The stopping position.
samples_list_path (Path, optional) – File with list of samples to include in the subgraph.
- process_sample(sample, shared_progress, task_id)[source]
Traite un échantillon en plusieurs étapes avec mise à jour de la progression via un dictionnaire partagé.