YouTube Video Downloader in Python Basic Version – Beginner Project with Code

🎯 Python Project: YouTube Video Downloader (Basic)



In this post, I’ll show you how to build a basic YouTube video downloader in Python using the pytubefix library. It's a perfect mini-project for beginners who want to practice working with third-party libraries, user input, and downloading web content.

✅ Project Overview

This script will:

  • Take a YouTube Video URL
  • Display video details (title, author, duration)
  • Download a reasonably good resolution video (usually 360p or 720p by default)

⚠️ Note: For truly high-resolution (1080p, 4K) downloads, you often need to download video and audio streams separately and merge them. That is an advanced project we will upload this one also soon!

πŸ› ️ Prerequisites

First, install the pytubefix library using pip:

pip install pytubefix

πŸ’» The Python Code


from pytubefix import YouTube
from pytubefix.cli import on_progress

# You can also use input() to take URL from user
# For example:
# url = input("Enter YouTube URL: ")

# Hardcoded URL example
url = "https://youtu.be/pnZ0lAjfFCQ?si=w5GrvuKd6XNLOrIY"

# Clean the URL if it has query parameters
if '?' in url:
    url = url.split('?')[0]

# Create YouTube object
yt = YouTube(url, on_progress_callback=on_progress)

# Print video details
print(f"Title   : {yt.title}")
print(f"Author  : {yt.author}")
print(f"Length  : {yt.length // 60} minutes {yt.length % 60} seconds")

# Get the highest resolution stream available in progressive format
stream = yt.streams.get_highest_resolution()
print(f"Downloading in: {stream.resolution}")

# Download the video
stream.download()
print("\n✅ Download complete!")

🧭 How It Works

  1. Import the Libraries: We use pytubefix to interact with YouTube and on_progress to show a progress bar during download.
  2. Define the URL: You can hardcode it or get it from input(). We clean it by removing extra query parameters to ensure compatibility.
  3. Create a YouTube Object: This object fetches all the video metadata.
  4. Print Video Info: The script displays the video's title, author, and duration in minutes and seconds.
  5. Select the Stream: We get the highest-resolution *progressive* stream available. Note that progressive streams include both video and audio in one file, but max out around 720p.
  6. Download: The video downloads to the current directory with progress displayed in the terminal.

πŸ–₯️ Expected Output Example

When you run this script, you might see something like:


Title   : Relaxing Music for Stress Relief
Author  : Calmed By Nature
Length  : 2 minutes 45 seconds
Downloading in: 720p

[####################] 100% Complete

✅ Download complete!

✨ Tips to Extend This Project

  • Take the URL from the user using input().
  • Specify a custom output folder with stream.download(output_path="your_folder").
  • Offer the user choices between audio-only or different resolutions.
  • For full HD and 4K videos: We will Upload Soon!
  • Wrap it in a simple command-line interface or GUI with Tkinter.

This basic project is a great starting point for learning how to automate downloads from YouTube with Python. You can expand it into more advanced features or integrate it into your own apps.


Happy Coding! Feel free to share questions or improvements in the comments!

“In the world of code, Python is the language of simplicity, where logic meets creativity, and every line brings us closer to our goals.”— Only Python

πŸ“Œ Follow Us And Stay Updated For Daily Updates

πŸ“– More Resources

πŸ“š Python Crash Course Chapter-wise Exercises
πŸ“š AI And MACHINE LEARNING ROADMAP: From Basic to Advanced
    Stage 1: Python & Programming Fundamentals
    
    ----------------------------------------
    1. Python & Programming Fundamentals
    ----------------------------------------
    1.1 Environment Setup
        • Install Python 3.x, VS Code / PyCharm
        • Configure linting, formatters (e.g., Pylint, Black)
        • Jupyter Notebook / Google Colab basics
    
    1.2 Core Python Syntax
        • Variables, Data Types (int, float, str, bool)
        • Operators: arithmetic, comparison, logical, bitwise
        • Control Flow: if / else / elif
        • Loops: for, while, break/continue
    
    1.3 Functions & Modules
        • Defining functions, return values
        • Parameters: positional, keyword, default args
        • *args, **kwargs
        • Organizing code: modules and packages
        • Standard library exploration (os, sys, datetime, random, math)
    
    1.4 Data Structures
        • Lists, Tuples, Sets, Dictionaries
        • List/dict comprehensions
        • Built-in functions: map, filter, zip, enumerate
        • When to use which structure
    
    1.5 File Handling & Exceptions
        • Reading/Writing text and binary files
        • Context managers (`with` statement)
        • Exception handling: try/except/finally
        • Custom exceptions
    
    1.6 Object-Oriented Programming (OOP)
        • Classes, Instances, Attributes, Methods
        • __init__, self, class vs instance attributes
        • Inheritance, Polymorphism, Encapsulation
        • Magic methods: __str__, __repr__, __add__, etc.
        • Use-cases in structuring larger projects
    
    1.7 Virtual Environments & Package Management
        • venv / pipenv / poetry basics
        • Installing and managing dependencies
        • requirements.txt and environment.yml
    
    πŸ›  Tools: VS Code, Git for version control, Jupyter/Colab
        
    Stage 2: Mathematics for Machine Learning
    
    ----------------------------------------
    2. Mathematics for Machine Learning
    ----------------------------------------
    2.1 Linear Algebra
        • Scalars, Vectors, Matrices, Tensors
        • Operations: addition, multiplication, dot product
        • Matrix properties: transpose, inverse, rank
        • Eigenvalues & Eigenvectors (intuition)
        • Applications: data transformations, PCA
    
    2.2 Calculus
        • Functions and limits (intuitive overview)
        • Derivatives: gradient of single-variable and multi-variable functions
        • Chain rule (key for backpropagation in neural networks)
        • Partial derivatives
        • Basic integration (overview; less often used directly)
    
    2.3 Probability & Statistics
        • Basic probability theory: events, conditional probability, Bayes’ theorem
        • Random variables, distributions (normal, binomial, Poisson, etc.)
        • Descriptive statistics: mean, median, mode, variance, standard deviation
        • Inferential statistics: hypothesis testing, p-values, confidence intervals
        • Sampling methods, bias, variance concepts
    
    2.4 Optimization Basics
        • Concept of optimization in ML (finding minima of loss functions)
        • Gradient descent: batch, stochastic, mini-batch
        • Learning rate intuition
    
    πŸ›  Tools / References: 
        • Interactive calculators: Desmos, GeoGebra
        • Python libraries: NumPy for experimentation
        
    Stage 3: Data Handling & Preprocessing
    
    ----------------------------------------
    3. Data Handling & Preprocessing
    ----------------------------------------
    3.1 NumPy Essentials
        • ndarrays: creation, indexing, slicing
        • Vectorized operations vs Python loops
        • Broadcasting rules
        • Random number generation
    
    3.2 Pandas for Tabular Data
        • Series & DataFrame: creation and basic ops
        • Reading data: CSV, Excel, JSON
        • Indexing, selection (loc/iloc), filtering rows
        • Handling missing values: dropna, fillna
        • Detecting/removing duplicates
        • Combining datasets: merge, join, concat
        • GroupBy operations, aggregation, pivot tables
    
    3.3 Feature Engineering
        • Feature scaling: normalization (Min-Max), standardization (Z-score)
        • Encoding categorical variables: one-hot, ordinal encoding
        • Date/time feature extraction (if applicable)
        • Creating new features via domain knowledge
        • Feature selection: variance threshold, correlation analysis
    
    3.4 Data Visualization
        • Matplotlib basics: line plot, scatter plot, histograms, bar charts
        • Seaborn overview: higher-level plots (heatmap, pairplot)
        • Visualizing distributions, relationships, outliers
        • Plot customization: titles, labels, legends
    
    3.5 Handling Real-World Data Challenges
        • Imbalanced datasets: oversampling (SMOTE), undersampling, class weights
        • Outlier detection and treatment
        • Data leakage awareness
        • Pipeline creation in scikit-learn
    
    πŸ›  Tools: NumPy, Pandas, Matplotlib, Seaborn, scikit-learn utilities
        
    Stage 4: Core Machine Learning
    
    ----------------------------------------
    4. Core Machine Learning
    ----------------------------------------
    4.1 ML Concepts & Workflow
        • What is ML? Supervised vs Unsupervised vs Semi-supervised vs Reinforcement
        • Training, Validation, Testing splits
        • Overfitting vs Underfitting, bias-variance trade-off
        • Cross-validation techniques: k-fold, stratified
    
    4.2 Supervised Learning: Regression
        • Linear Regression: assumptions, cost function, normal equation
        • Regularized Regression: Ridge, Lasso, Elastic Net
        • Polynomial Regression
        • Evaluation metrics: MSE, RMSE, MAE, R²
    
    4.3 Supervised Learning: Classification
        • Logistic Regression: sigmoid, decision boundary, loss
        • k-Nearest Neighbors (KNN)
        • Decision Trees: entropy/gini, pruning
        • Ensemble Methods:
            - Bagging: Random Forest
            - Boosting: AdaBoost, Gradient Boosting, XGBoost (intro)
        • Support Vector Machines (SVM): kernel trick overview
        • Naive Bayes: Gaussian, Multinomial
        • Evaluation: accuracy, precision, recall, F1-score, ROC-AUC
        • Confusion matrix analysis
    
    4.4 Unsupervised Learning
        • Clustering:
            - K-Means: elbow method, silhouette score
            - Hierarchical clustering: dendrograms
            - DBSCAN
        • Dimensionality Reduction:
            - PCA: variance explained
            - t-SNE / UMAP (visualization-focused)
        • Anomaly Detection overview
    
    4.5 Model Selection & Tuning
        • Hyperparameter tuning: grid search, random search, Bayesian optimization (overview)
        • Automated tuning libraries (e.g., scikit-learn’s GridSearchCV, RandomizedSearchCV)
        • Pipeline building to avoid leakage
        • Feature importance and model interpretability basics
    
    πŸ›  Tools: scikit-learn, pandas, NumPy
        
    Stage 5: Deep Learning Foundations
    
    ----------------------------------------
    5. Deep Learning Foundations
    ----------------------------------------
    5.1 Neural Network Basics
        • Artificial neuron model, activation functions (ReLU, Sigmoid, Tanh)
        • Architecture: input, hidden, output layers
        • Forward propagation, loss functions (Cross-entropy, MSE)
        • Backpropagation: gradient computation, chain rule
    
    5.2 Deep Learning Frameworks
        • TensorFlow & Keras: Sequential and Functional APIs
        • PyTorch basics: tensors, autograd, nn.Module
        • Comparing TF/Keras vs PyTorch (choose one to start)
    
    5.3 Training Deep Models
        • Optimizers: SGD, Adam, RMSprop
        • Learning rate scheduling
        • Regularization: Dropout, Batch Normalization, Weight Decay
        • Handling overfitting: early stopping, data augmentation
    
    5.4 Basic DL Projects
        • MNIST digit classification
        • CIFAR-10 image classification (small CNN)
        • Simple feedforward network on tabular data
    
    πŸ›  Tools: TensorFlow/Keras or PyTorch, GPU if available (Colab/GPU runtime)
        
    Stage 6: Advanced Deep Learning & Architectures
    
    ----------------------------------------
    6. Advanced Deep Learning & Architectures
    ----------------------------------------
    6.1 Convolutional Neural Networks (CNNs)
        • Convolution operations, filters, feature maps
        • Pooling layers, padding, stride
        • Famous architectures overview: LeNet, AlexNet, VGG, ResNet (intuition)
        • Transfer Learning: fine-tuning pre-trained models
    
    6.2 Recurrent Neural Networks (RNNs) & Sequence Models
        • RNN basics: hidden states, vanishing gradients
        • LSTM, GRU: gating mechanisms
        • Sequence-to-sequence models (intro)
        • Attention mechanism: intuition
    
    6.3 Transformers & Attention
        • Self-attention mechanism
        • Transformer architecture: encoder, decoder overview
        • Pre-trained transformer models: BERT, GPT family (conceptual)
        • Fine-tuning transformers for tasks
    
    6.4 Generative Models
        • Autoencoders: basic, variational autoencoders (VAE) overview
        • Generative Adversarial Networks (GANs): generator/discriminator intuition
        • Applications and basic experiments
    
    6.5 Advanced Techniques
        • Multi-task learning, meta-learning (intro)
        • Few-shot learning, transfer learning deeper dive
        • Neural architecture search (overview)
        • Model compression, pruning, quantization (deployment considerations)
    
    πŸ›  Tools: TensorFlow / PyTorch, Hugging Face Transformers library
        
    Stage 7: Natural Language Processing (NLP) Advanced
    
    ----------------------------------------
    7. Natural Language Processing (NLP)
    ----------------------------------------
    7.1 Text Preprocessing & Representation
        • Tokenization (word, subword/BPE)
        • Stopwords removal, lemmatization vs stemming
        • Word embeddings: Word2Vec, GloVe, FastText
        • Contextual embeddings: ELMo, BERT embeddings
    
    7.2 Transformer-based NLP
        • Pre-trained models: BERT, RoBERTa, GPT, T5
        • Fine-tuning for classification, QA, summarization
        • Sequence generation tasks using GPT-like models
    
    7.3 Specialized NLP Tasks
        • Named Entity Recognition (NER)
        • Machine Translation overview
        • Question Answering pipelines
        • Text Summarization (extractive vs abstractive)
        • Sentiment Analysis deep dive
    
    7.4 Evaluation Metrics in NLP
        • BLEU, ROUGE, METEOR (for generation)
        • Accuracy, F1 for classification tasks
    
    πŸ›  Tools: Hugging Face Transformers, spaCy, NLTK
        
    Stage 8: Computer Vision Advanced
    
    ----------------------------------------
    8. Computer Vision (CV)
    ----------------------------------------
    8.1 Image Preprocessing & Augmentation
        • OpenCV basics: reading, resizing, color conversions
        • Data augmentation techniques: flips, rotations, crops, color jitter
    
    8.2 Advanced CNN Architectures
        • Inception, ResNet, DenseNet, EfficientNet (conceptual)
        • Transfer learning and fine-tuning advanced models
        • Object detection frameworks: YOLOvX, SSD, Faster R-CNN (overview)
        • Semantic segmentation: U-Net, Mask R-CNN
        • Instance segmentation concepts
    
    8.3 Vision Transformers (ViT)
        • Applying transformer concepts to images
        • Fine-tuning ViT for classification
    
    8.4 Specialized CV Tasks
        • Face recognition pipelines
        • Video analysis basics: action recognition, object tracking
        • 3D vision intro (depth estimation)
    
    πŸ›  Tools: OpenCV, TensorFlow/PyTorch, libraries like Detectron2 or YOLO implementations
        
    Stage 9: Reinforcement Learning & Advanced Topics
    
    ----------------------------------------
    9. Reinforcement Learning & Advanced Topics
    ----------------------------------------
    9.1 Reinforcement Learning Foundations
        • Markov Decision Process (MDP)
        • Value functions, policy functions
        • Q-Learning, SARSA (tabular methods)
        • Exploration vs Exploitation
    
    9.2 Deep Reinforcement Learning
        • Deep Q-Networks (DQN)
        • Policy Gradient Methods: REINFORCE, Actor-Critic
        • Advanced: A3C, PPO, DDPG overview
    
    9.3 Other Advanced AI Topics
        • Graph Neural Networks (GNNs): node/graph embeddings (overview)
        • Time Series Forecasting with ML/DL: RNN/LSTM, Prophet intro
        • Bayesian Methods overview
        • AutoML and neural architecture search concepts
        • Federated Learning basics (privacy-aware training)
        • MLOps fundamentals:
            - Model versioning
            - Continuous integration/continuous deployment (CI/CD) for ML
            - Monitoring models in production
            - Tools: MLflow, Kubeflow (intro)
        • Edge AI / TinyML overview (deploying models on devices)
    
    πŸ›  Tools: RL libraries (Stable Baselines3), MLflow, Kubernetes intro, Docker
        
    Stage 10: Deployment, Production & MLOps
    
    ----------------------------------------
    10. Deployment, Production & MLOps
    ----------------------------------------
    10.1 Model Serving & APIs
        • REST API with Flask / FastAPI
        • gRPC basics (overview)
        • Dockerizing ML applications
        • Serving with TensorFlow Serving or TorchServe
    
    10.2 Cloud Deployment
        • Deploy on AWS Sagemaker / GCP AI Platform / Azure ML (basic workflow)
        • Serverless deployments (AWS Lambda, Cloud Functions) for small models
        • CI/CD pipelines for ML: GitHub Actions or Jenkins integration
    
    10.3 Monitoring & Maintenance
        • Logging model inputs/outputs
        • Drift detection (data/model drift)
        • Retraining pipelines (automated or scheduled)
        • Scaling considerations
    
    10.4 MLOps Tools & Practices
        • Experiment tracking (MLflow, Weights & Biases)
        • Data versioning (DVC)
        • Model registry concepts
        • Infrastructure as Code (Terraform intro)
    
    πŸ›  Tools: Docker, Kubernetes basics, CI/CD tools, cloud consoles
        
    Stage 11: Real-World Projects & Portfolio
    
    ----------------------------------------
    11. Real-World Projects & Portfolio
    ----------------------------------------
    11.1 Project Ideas by Domain
        • Tabular Data: Predictive analytics (e.g., churn prediction)
        • NLP: Chatbot, summarizer, translation prototype
        • CV: Image classifier, object detector, image segmentation app
        • Time Series: Forecasting stock or weather data
        • RL: Simple game-playing agent
        • Generative: GAN art generation or style transfer demo
    
    11.2 End-to-End Pipeline
        • Data collection & preprocessing
        • Model training & validation
        • Deployment as API or web app (Streamlit/Flask)
        • Monitoring & iteration
        • Documentation & README
    
    11.3 Collaboration & Open Source
        • Participate in Kaggle competitions (beginner → intermediate)
        • Contribute to open-source ML projects
        • Write blog posts/tutorials documenting your projects
    
    11.4 Soft Skills & Communication
        • Clear README, code comments
        • Presentation slides or videos of project demos
        • Networking: sharing work on LinkedIn, GitHub
    
    πŸ›  Tools: GitHub Pages, Streamlit, Heroku/Netlify, Docker
        
    Stage 12: Ethics, Explainability & Continuous Learning
    
    ----------------------------------------
    12. Ethics, Explainability & Continuous Learning
    ----------------------------------------
    12.1 AI Ethics & Responsible AI
        • Bias & Fairness: identifying and mitigating bias
        • Privacy concerns: GDPR, data protection best practices
        • Transparency: documenting data sources and model decisions
    
    12.2 Explainable AI (XAI)
        • Model interpretability: SHAP, LIME (basic usage)
        • Interpreting black-box models vs inherently interpretable models
        • Communicating explanations to stakeholders
    
    12.3 Continuous Learning & Staying Updated
        • Following research: arXiv alerts, ML conferences (NeurIPS, ICML, CVPR summaries)
        • Blogs, podcasts, newsletters (e.g., “The Batch” by deeplearning.ai)
        • Reading codebases of popular libraries, exploring new architectures
        • Community involvement: forums, study groups
    
    12.4 Advanced Research Topics (Optional/For Aspirants)
        • Research paper reading workflow
        • Experimentation frameworks
        • Contributing to academic research or advanced industrial research
    
    πŸ›  Tools: arXiv, Google Scholar alerts, RSS readers, community forums
        

Comments