5 Useful DIY Python Functions for JSON Parsing and Processing

Image by Author

# Introduction

Working with JSON in Python is often challenging. The basic json.loads() only gets you so far.

API responses, configuration files, and data exports often contain JSON that is messy or poorly structured. You need to flatten nested objects, safely extract values without KeyError exceptions, merge multiple JSON files, or convert between JSON and other formats. These tasks come up constantly in web scraping, API integration, and data processing. This article walks you through five practical functions for handling common JSON parsing and processing tasks.

You can find the code for these functions on GitHub.

# 1. Safely Extracting Nested Values

JSON objects often nest several levels deep. Accessing deeply nested values with bracket notation gets challenging fast. If any key is missing, you get a KeyError.

Here is a function that lets you access nested values using dot notation, with a fallback for missing keys:

def get_nested_value(data, path, default=None):
“””
Safely extract nested values from JSON using dot notation.

Args:
data: Dictionary or JSON object
path: Dot-separated string like “user.profile.email”
default: Value to return if path doesn’t exist

Returns:
The value at the path, or default if not found
“””
keys = path.split(‘.’)
current = data

for key in keys:
if isinstance(current, dict):
current = current.get(key)
if current is None:
return default
elif isinstance(current, list):
try:
index = int(key)
current = current[index]
except (ValueError, IndexError):
return default
else:
return default

return current

Let’s test it with a complex nested structure:

# Sample JSON data
user_data = {
“user”: {
“id”: 123,
“profile”: {
“name”: “Allie”,
“email”: “allie@example.com”,
“settings”: {
“theme”: “dark”,
“notifications”: True
}
},
“posts”: [
{“id”: 1, “title”: “First Post”},
{“id”: 2, “title”: “Second Post”}
]
}
}

# Extract values
email = get_nested_value(user_data, “user.profile.email”)
theme = get_nested_value(user_data, “user.profile.settings.theme”)
first_post = get_nested_value(user_data, “user.posts.0.title”)
missing = get_nested_value(user_data, “user.profile.age”, default=25)

print(f”Email: {email}”)
print(f”Theme: {theme}”)
print(f”First post: {first_post}”)
print(f”Age (default): {missing}”)

Output:

Email: allie@example.com
Theme: dark
First post: First Post
Age (default): 25

The function splits the path string on dots and walks through the data structure one key at a time. At each level, it checks if the current value is a dictionary or a list. For dictionaries, it uses .get(key), which returns None for missing keys instead of raising an error. For lists, it tries to convert the key to an integer index.

The default parameter provides a fallback when any part of the path does not exist. This prevents your code from crashing when dealing with incomplete or inconsistent JSON data from APIs.

This pattern is especially useful when processing API responses where some fields are optional or only present under certain conditions.

# 2. Flattening Nested JSON into Single-Level Dictionaries

Machine learning models, CSV exports, and database inserts often need flat data structures. But API responses and configuration files use nested JSON. Converting nested objects to flat key-value pairs is a common task.

Here is a function that flattens nested JSON with customizable separators:

def flatten_json(data, parent_key=”, separator=”_”):
“””
Flatten nested JSON into a single-level dictionary.

Args:
data: Nested dictionary or JSON object
parent_key: Prefix for keys (used in recursion)
separator: String to join nested keys

Returns:
Flattened dictionary with concatenated keys
“””
items = []

if isinstance(data, dict):
for key, value in data.items():
new_key = f”{parent_key}{separator}{key}” if parent_key else key

if isinstance(value, dict):
# Recursively flatten nested dicts
items.extend(flatten_json(value, new_key, separator).items())
elif isinstance(value, list):
# Flatten lists with indexed keys
for i, item in enumerate(value):
list_key = f”{new_key}{separator}{i}”
if isinstance(item, (dict, list)):
items.extend(flatten_json(item, list_key, separator).items())
else:
items.append((list_key, item))
else:
items.append((new_key, value))
else:
items.append((parent_key, data))

return dict(items)

Now let’s flatten a complex nested structure:

# Complex nested JSON
product_data = {
“product”: {
“id”: 456,
“name”: “Laptop”,
“specs”: {
“cpu”: “Intel i7”,
“ram”: “16GB”,
“storage”: {
“type”: “SSD”,
“capacity”: “512GB”
}
},
“reviews”: [
{“rating”: 5, “comment”: “Excellent”},
{“rating”: 4, “comment”: “Good value”}
]
}
}

flattened = flatten_json(product_data)

for key, value in flattened.items():
print(f”{key}: {value}”)

Output:

product_id: 456
product_name: Laptop
product_specs_cpu: Intel i7
product_specs_ram: 16GB
product_specs_storage_type: SSD
product_specs_storage_capacity: 512GB
product_reviews_0_rating: 5
product_reviews_0_comment: Excellent
product_reviews_1_rating: 4
product_reviews_1_comment: Good value

The function uses recursion to handle arbitrary nesting depth. When it encounters a dictionary, it processes each key-value pair, building up the flattened key by concatenating parent keys with the separator.

For lists, it uses the index as part of the key. This lets you preserve the order and structure of array elements in the flattened output. The pattern reviews_0_rating tells you this is the rating from the first review.

The separator parameter lets you customize the output format. Use dots for dot notation, underscores for snake_case, or slashes for path-like keys depending on your needs.

This function is particularly useful when you need to convert JSON API responses into dataframes or CSV rows where each column needs a unique name.

# 3. Deep Merging Multiple JSON Objects

Configuration management often requires merging multiple JSON files containing default settings, environment-specific configs, user preferences, and more. A simple dict.update() only handles the top level. You need deep merging that recursively combines nested structures.

Here is a function that deep merges JSON objects:

def deep_merge_json(base, override):
“””
Deep merge two JSON objects, with override taking precedence.

Args:
base: Base dictionary
override: Dictionary with values to override/add

Returns:
New dictionary with merged values
“””
result = base.copy()

for key, value in override.items():
if key in result and isinstance(result[key], dict) and isinstance(value, dict):
# Recursively merge nested dictionaries
result[key] = deep_merge_json(result[key], value)
else:
# Override or add the value
result[key] = value

return result

Let’s try merging sample configuration info:

import json

# Default configuration
default_config = {
“database”: {
“host”: “localhost”,
“port”: 5432,
“timeout”: 30,
“pool”: {
“min”: 2,
“max”: 10
}
},
“cache”: {
“enabled”: True,
“ttl”: 300
},
“logging”: {
“level”: “INFO”
}
}

# Production overrides
prod_config = {
“database”: {
“host”: “prod-db.example.com”,
“pool”: {
“min”: 5,
“max”: 50
}
},
“cache”: {
“ttl”: 600
},
“monitoring”: {
“enabled”: True
}
}

merged = deep_merge_json(default_config, prod_config)

print(json.dumps(merged, indent=2))

Output:

{
“database”: {
“host”: “prod-db.example.com”,
“port”: 5432,
“timeout”: 30,
“pool”: {
“min”: 5,
“max”: 50
}
},
“cache”: {
“enabled”: true,
“ttl”: 600
},
“logging”: {
“level”: “INFO”
},
“monitoring”: {
“enabled”: true
}
}

The function recursively merges nested dictionaries. When both the base and override contain dictionaries at the same key, it merges those dictionaries instead of replacing them entirely. This preserves values that are not explicitly overridden.

Notice how database.port and database.timeout remain from the default configuration, while database.host gets overridden. The pool settings merge at the nested level, so min and max both get updated.

The function also adds new keys that do not exist in the base config, like the monitoring section in the production override.

You can chain multiple merges to layer configurations:

final_config = deep_merge_json(
deep_merge_json(default_config, prod_config),
user_preferences
)

This pattern is common in application configuration where you have defaults, environment-specific settings, and runtime overrides.

# 4. Filtering JSON by Schema or Whitelist

APIs often return more data than you need. Large JSON responses make your code harder to read. Sometimes you only want specific fields, or you need to remove sensitive data before logging.

Here is a function that filters JSON to keep only specified fields:

def filter_json(data, schema):
“””
Filter JSON to keep only fields specified in schema.

Args:
data: Dictionary or JSON object to filter
schema: Dictionary defining which fields to keep
Use True to keep a field, nested dict for nested filtering

Returns:
Filtered dictionary containing only specified fields
“””
if not isinstance(data, dict) or not isinstance(schema, dict):
return data

result = {}

for key, value in schema.items():
if key not in data:
continue

if value is True:
# Keep this field as-is
result[key] = data[key]
elif isinstance(value, dict):
# Recursively filter nested object
if isinstance(data[key], dict):
filtered_nested = filter_json(data[key], value)
if filtered_nested:
result[key] = filtered_nested
elif isinstance(data[key], list):
# Filter each item in the list
filtered_list = []
for item in data[key]:
if isinstance(item, dict):
filtered_item = filter_json(item, value)
if filtered_item:
filtered_list.append(filtered_item)
else:
filtered_list.append(item)
if filtered_list:
result[key] = filtered_list

return result

Let’s filter a sample API response:

import json
# Sample API response
api_response = {
“user”: {
“id”: 789,
“username”: “Cayla”,
“email”: “cayla@example.com”,
“password_hash”: “secret123”,
“profile”: {
“name”: “Cayla Smith”,
“bio”: “Software developer”,
“avatar_url”: “https://example.com/avatar.jpg”,
“private_notes”: “Internal notes”
},
“posts”: [
{
“id”: 1,
“title”: “Hello World”,
“content”: “My first post”,
“views”: 100,
“internal_score”: 0.85
},
{
“id”: 2,
“title”: “Python Tips”,
“content”: “Some tips”,
“views”: 250,
“internal_score”: 0.92
}
]
},
“metadata”: {
“request_id”: “abc123”,
“server”: “web-01″
}
}

# Schema defining what to keep
public_schema = {
“user”: {
“id”: True,
“username”: True,
“profile”: {
“name”: True,
“avatar_url”: True
},
“posts”: {
“id”: True,
“title”: True,
“views”: True
}
}
}

filtered = filter_json(api_response, public_schema)

print(json.dumps(filtered, indent=2))

Output:

{
“user”: {
“id”: 789,
“username”: “Cayla”,
“profile”: {
“name”: “Cayla Smith”,
“avatar_url”: “https://example.com/avatar.jpg”
},
“posts”: [
{
“id”: 1,
“title”: “Hello World”,
“views”: 100
},
{
“id”: 2,
“title”: “Python Tips”,
“views”: 250
}
]
}
}

The schema acts as a whitelist. Setting a field to True includes it in the output. Using a nested dictionary lets you filter nested objects. The function recursively applies the schema to nested structures.

For arrays, the schema applies to each item. In the example, the posts array gets filtered so each post only includes id, title, and views, while content and internal_score are excluded.

Notice how sensitive fields like password_hash and private_notes do not appear in the output. This makes the function useful for sanitizing data before logging or sending to frontend applications.

You can create different schemas for different use cases, such as a minimal schema for list views, a detailed schema for single-item views, and an admin schema that includes everything.

# 5. Converting JSON to and from Dot Notation

Some systems use flat key-value stores, but you want to work with nested JSON in your code. Converting between flat dot-notation keys and nested structures helps achieve this.

Here is a pair of functions for bidirectional conversion.

// Converting JSON to Dot Notation

def json_to_dot_notation(data, parent_key=”):
“””
Convert nested JSON to flat dot-notation dictionary.

Args:
data: Nested dictionary
parent_key: Prefix for keys (used in recursion)

Returns:
Flat dictionary with dot-notation keys
“””
items = {}

if isinstance(data, dict):
for key, value in data.items():
new_key = f”{parent_key}.{key}” if parent_key else key

if isinstance(value, dict):
items.update(json_to_dot_notation(value, new_key))
else:
items[new_key] = value
else:
items[parent_key] = data

return items

// Converting Dot Notation to JSON

def dot_notation_to_json(flat_data):
“””
Convert flat dot-notation dictionary to nested JSON.

Args:
flat_data: Dictionary with dot-notation keys

Returns:
Nested dictionary
“””
result = {}

for key, value in flat_data.items():
parts = key.split(‘.’)
current = result

for i, part in enumerate(parts[:-1]):
if part not in current:
current[part] = {}
current = current[part]

current[parts[-1]] = value

return result

Let’s test the round-trip conversion:

import json
# Original nested JSON
config = {
“app”: {
“name”: “MyApp”,
“version”: “1.0.0”
},
“database”: {
“host”: “localhost”,
“credentials”: {
“username”: “admin”,
“password”: “secret”
}
},
“features”: {
“analytics”: True,
“notifications”: False
}
}

# Convert to dot notation (for environment variables)
flat = json_to_dot_notation(config)
print(“Flat format:”)
for key, value in flat.items():
print(f” {key} = {value}”)

print(“\n” + “=”*50 + “\n”)

# Convert back to nested JSON
nested = dot_notation_to_json(flat)

print(“Nested format:”)
print(json.dumps(nested, indent=2))

Output:

Flat format:
app.name = MyApp
app.version = 1.0.0
database.host = localhost
database.credentials.username = admin
database.credentials.password = secret
features.analytics = True
features.notifications = False

==================================================

Nested format:
{
“app”: {
“name”: “MyApp”,
“version”: “1.0.0”
},
“database”: {
“host”: “localhost”,
“credentials”: {
“username”: “admin”,
“password”: “secret”
}
},
“features”: {
“analytics”: true,
“notifications”: false
}
}

The json_to_dot_notation function flattens the structure by recursively walking through nested dictionaries and joining keys with dots. Unlike the earlier flatten function, this one does not handle arrays; it is optimized for configuration data that is purely key-value.

The dot_notation_to_json function reverses the process. It splits each key on dots and builds up the nested structure by creating intermediate dictionaries as needed. The loop handles all parts except the last one, creating nesting levels. Then it assigns the value to the final key.

This approach keeps your configuration readable and maintainable while working within the constraints of flat key-value systems.

# Wrapping Up

JSON processing goes beyond basic json.loads(). In most projects, you will need tools to navigate nested structures, transform shapes, merge configurations, filter fields, and convert between formats.

The techniques in this article transfer to other data processing tasks as well. You can modify these patterns for XML, YAML, or custom data formats.

Start with the safe access function to prevent KeyError exceptions in your code. Add the others as you run into specific needs. Happy coding!

Bala Priya C is a developer and technical writer from India. She likes working at the intersection of math, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding, and coffee! Currently, she’s working on learning and sharing her knowledge with the developer community by authoring tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource overviews and coding tutorials.

What's Hot

How Federal Agencies Got Caught Up in Trump’s Anti-Immigration Crusade

Are consumers doomed to pay more for electricity due to data center buildouts?

I can’t use Linux as my daily driver anymore

Time Series Cross-Validation: Techniques & Implementation

The Globe-Spanning, Multi-Newsroom Hunt for Mr. Deepfakes

What It’s Like to Have a Brain Implant for 5 Years

Physical Intelligence Team Unveils MEM for Robots: A Multi-Scale Memory System Giving Gemma 3-4B VLAs 15-Minute Context for Complex Tasks

X Will Stop Paying People for Sharing Unlabeled AI-Generated War Footage

How to Build a Stable and Efficient QLoRA Fine-Tuning Pipeline Using Unsloth for Large Language Models

How Federal Agencies Got Caught Up in Trump’s Anti-Immigration Crusade

Are consumers doomed to pay more for electricity due to data center buildouts?

I can’t use Linux as my daily driver anymore

How Federal Agencies Got Caught Up in Trump’s Anti-Immigration Crusade

Are consumers doomed to pay more for electricity due to data center buildouts?

I can’t use Linux as my daily driver anymore

Usefull link

categories

What's Hot

5 Useful DIY Python Functions for JSON Parsing and Processing

# Introduction

# 1. Safely Extracting Nested Values

# 2. Flattening Nested JSON into Single-Level Dictionaries

# 3. Deep Merging Multiple JSON Objects

# 4. Filtering JSON by Schema or Whitelist

# 5. Converting JSON to and from Dot Notation

// Converting JSON to Dot Notation

// Converting Dot Notation to JSON

# Wrapping Up

Related Posts

Usefull link

categories