Image by Author
# How Colab Works
Google Colab is an incredibly powerful tool for data science, machine learning, and Python development. This is because it removes the headache of local setup. However, one area that often confuses beginners and sometimes even intermediate users is file management.
Where do files live? Why do they disappear? How do you upload, download, or permanently store data? This article answers all of that, step by step.
Let’s clear up the biggest misunderstanding right away. Google Colab does not work like your laptop. Every time you open a notebook, Colab gives you a temporary virtual machine (VM). Once you leave, everything inside is cleared. This means:
- Files saved locally are temporary
- When the runtime resets, files are gone
Your default working directory is:
Anything you save inside /content will vanish once the runtime resets.
# Viewing Files In Colab
You have two easy ways to view your files.
// Method 1: Using The Visual Way
This is the recommended approach for beginners:
- Look at the left sidebar
- Click the folder icon
- Browse inside /content
This is great when you just want to see what is going on.
// Method 2: Using The Python Way
This is handy when you are scripting or debugging paths.
import os
os.listdir(‘/content’)
# Uploading & Downloading Files
Suppose you have a dataset or a comma-separated values (CSV) file on your laptop. The first method is uploading using code.
from google.colab import files
files.upload()
A file picker opens, you select your file, and it appears in /content. This file is temporary unless moved elsewhere.
The second method is drag and drop. This way is simple, but the storage remains temporary.
- Open the file explorer (left panel)
- Drag files directly into /content
To download a file from Colab to your local machine:
from google.colab import files
files.download(‘model.pkl’)
Your browser will download the file instantly. This works for CSVs, models, logs, and images.
If you want your files to survive runtime resets, you must use Google Drive. To mount Google Drive:
from google.colab import drive
drive.mount(‘/content/drive’)
Once you authorize access, your Drive appears at:
Anything saved here is permanent.
# Recommended Project Folder Structure
A messy Drive becomes painful very fast. A clean structure that you can reuse is:
MyDrive/
└── ColabProjects/
└── My_Project/
├── data/
├── notebooks/
├── models/
├── outputs/
└── README.md
To save time, you can use paths like:
BASE_PATH = ‘/content/drive/MyDrive/ColabProjects/My_Project’
DATA_PATH = f'{BASE_PATH}/data/train.csv’
To save a file permanently using Pandas:
import pandas as pd
df.to_csv(‘/content/drive/MyDrive/data.csv’, index=False)
To load a file later:
df = pd.read_csv(‘/content/drive/MyDrive/data.csv’)
# File Management in Colab
// Working With ZIP Files
To extract a ZIP file:
import zipfile
with zipfile.ZipFile(‘dataset.zip’, ‘r’) as zip_ref:
zip_ref.extractall(‘/content/data’)
// Using Shell Commands For File Management
Colab supports Linux shell commands using !.
!pwd
!ls
!mkdir data
!rm file.txt
!cp source.txt destination.txt
This is very useful for automation. Once you get used to this, you will use it frequently.
// Downloading Files Directly From The Internet
Instead of uploading manually, you can use wget:
!wget https://example.com/data.csv
Or using the Requests library in Python:
import requests
r = requests.get(url)
open(‘data.csv’, ‘wb’).write(r.content)
This is highly effective for datasets and pretrained models.
# Additional Considerations
// Storage Limits
You should be aware of the following limits:
- Colab VM disk space is approximately 100 GB (temporary)
- Google Drive storage is limited by your personal quota
- Browser-based uploads are capped at approximately 5 GB
For large datasets, always plan ahead.
// Best Practices
- Mount Drive at the start of the notebook
- Use variables for paths
- Keep raw data as read-only
- Separate data, models, and outputs into distinct folders
- Add a README file for your future self
// When Not To Use Google Drive
Avoid using Google Drive when:
- Training on extremely large datasets
- High-speed I/O is critical for performance
- You require distributed storage
Alternatives you can use in these cases include:
# Final Thoughts
Once you understand how Colab file management works, your workflow becomes much more efficient. There is no need for panic over lost files or rewriting code. With these tools, you can ensure clean experiments and smooth data transitions.
Kanwal Mehreen is a machine learning engineer and a technical writer with a profound passion for data science and the intersection of AI with medicine. She co-authored the ebook “Maximizing Productivity with ChatGPT”. As a Google Generation Scholar 2022 for APAC, she champions diversity and academic excellence. She’s also recognized as a Teradata Diversity in Tech Scholar, Mitacs Globalink Research Scholar, and Harvard WeCode Scholar. Kanwal is an ardent advocate for change, having founded FEMCodes to empower women in STEM fields.

