Menu

Sign in to track your progress and unlock all features.

Theme style

Log in

Full lesson preview

Stream-parse JSON lines into records

Parse newline-delimited JSON (JSON Lines) into Python records, skipping bad lines and optionally extracting keys.

Python practice22 minModules & Standard LibraryAdvancedLast updated March 23, 2026

Problem statement

You are given a string containing multiple JSON objects separated by newlines (JSON Lines). Implement parse_json_lines(text, keys=None) that reads the input text and returns a list of parsed records. Behavior: - Empty lines should be ignored. - Malformed JSON lines should be skipped (do not raise). - If keys is None, return a list of dictionaries (the parsed JSON objects). - If keys is a list of strings, for each valid JSON object return a tuple of values corresponding to those keys (use None for missing keys). - Preserve nested JSON objects as Python dicts in the output. This function is useful for streaming logs or data pipelines where some lines may be corrupted and you still want to salvage valid records.

Task

Implement a robust parser for JSON Lines that skips blank or malformed lines and can extract specified keys, returning consistent Python types.

Examples

Parse two JSON objects

Input

parse_json_lines('{"a": 1}\n{"b": 2}')

Output

[{'a': 1}, {'b': 2}]

Two JSON objects are parsed into a list of dictionaries.

Extract specific keys

Input

parse_json_lines('{"a":1,"b":2}\n{"a":3}', ['a','b'])

Output

[(1, 2), (3, None)]

When keys are provided, return tuples of values; missing keys yield None.

Input format

A single string argument text containing newline-separated JSON objects. Optional second argument keys as a list of strings or None.

Output format

A Python list of dicts (if keys is None) or a Python list of tuples (if keys provided).

Constraints

- Do not raise on malformed JSON lines; skip them. - Use only Python standard library (json is allowed). - Keep memory reasonable: assume input fits in memory for this exercise.

Samples

Sample 1

Input

parse_json_lines('{"x": {"y": 5}, "z": 1}\n{"x": {"y": 6}}', ['x','z'])

Output

[({'y': 5}, 1), ({'y': 6}, None)]

Nested dicts are preserved and missing keys become None.