"Transform Your Overwhelming Transcripts Into Sleek Markdown Files in Minutes! 📄✨"

"Drowning in pages of raw transcripts? Unveil the seamless method top content creators use to convert, clarify, and captivate with markdown magic."

Oct 20, 2023

A professional trying to extract and make sense out of long-winded video transcripts... Credits: DALL-E

The markdown_maker.py module is dedicated to transforming plain text, specifically video transcripts, into a structured and readable markdown format.

This quest will guide you through the functions provided in this module and how to effectively use them.

Prerequisites
Overview
Functions
- 1. generate_markdown_file
- 2. markdownify_intro
Usage

Prerequisites

Python Environment: Ensure you have Python installed and set up.
spaCy Library: This module uses the spaCy library for natural language processing.
Install it using:

pip install spacy

spaCy English Model: After installing spaCy, download the English model:

python -m spacy download en_core_web_sm

Overview

The markdown_maker.py module consists of functions that process transcripts and convert them into markdown format. It also supports the creation of markdown files.

Functions

1. `generate_markdown_file`

def generate_markdown_file(transcript_text, video_title):
    markdown_content = markdownify_intro(transcript_text)

    # Ensure the "yt-markdown" directory exists
    if not os.path.exists("yt-markdown"):
        os.makedirs("yt-markdown")

    # Create the markdown file
    file_name = f"{video_title}.md"
    file_path = os.path.join("yt-markdown", file_name)
    
    with open(file_path, "w") as md_file:
        md_file.write(markdown_content)

    return file_path

Purpose: Converts a transcript into markdown format and saves it as a markdown file.

Inputs:

transcript_text: A string containing the video transcript.
video_title: A string representing the title of the video.

Output:

The path of the generated markdown file.

How It Works:

Converts the transcript into markdown using markdownify_intro.
Creates (or ensures the existence of) the yt-markdown directory.
Saves the markdown content to a file named after the video title.

2. `markdownify_intro`

def markdownify_intro(text):
    if not isinstance(text, str):
        raise ValueError(f"Expected a string, but got: {type(text)}")

    # Process the text using spaCy
    doc = nlp(text)

    markdown_content = []

    # Extract sentences and format them
    for sentence in doc.sents:
        formatted_sentence = sentence.text

        # Bold named entities
        for entity in sentence.ents:
            if entity.label_ in ["PERSON", "ORG", "EVENT", "PRODUCT"]:
                formatted_sentence = formatted_sentence.replace(entity.text, f"**{entity.text}**")

        # Check if the sentence is a question or seems like an important statement
        if formatted_sentence.strip().endswith("?"):
            formatted_sentence = f"> {formatted_sentence}"

        # Headers (enhanced checks based on certain keywords or patterns)
        if any(keyword in formatted_sentence.lower() for keyword in ["introduce", "welcome", "thank you"]):
            formatted_sentence = f"## {formatted_sentence}"

        markdown_content.append(formatted_sentence)

    return "\n\n".join(markdown_content)

Purpose: Converts a plain transcript into a markdown-formatted string.

Input:

text: A string containing the transcript.

Output:

A string formatted in markdown.

How It Works:

Verifies the input type.
Processes the text with spaCy.
Formats named entities, questions, and potential headers.
Returns the markdown-formatted text.

Usage

Step 1: Import the necessary functions:

from markdown_maker import generate_markdown_file, markdownify_intro

Step 2: Convert a transcript into markdown:

transcript = "Your transcript content here."
formatted_text = markdownify_intro(transcript)

Step 3: Generate a markdown file from the transcript:

video_title = "Your Video Title"
markdown_file_path = generate_markdown_file(transcript, video_title)

The markdown file will be created in the yt-markdown directory with the specified video title.

This quest provides a step-by-step walkthrough of the markdown_maker.py module, ensuring you can effectively utilize its functionalities.

Andreas’ Substack

This instalment of AI Growth Pack is free for everyone. I send this email weekly. If you would also like to receive it, join the 1,200+ other smart people who absolutely love it today.

👉 If you enjoy reading this post, feel free to share it with friends! Or feel free to click the ❤️ button on this post so more people can discover it on Substack 🙏

Andreas’ AI Growth Pack Substack

Discussion about this post

Ready for more?