kakakakakku blog

Weekly Tech Blog: Keep on Learning!

Pandas で NDJSON (.jsonl) を読み込む

Pandas で NDJSON (Newline Delimited JSON) を読み込む場合 read_json() 関数に lines=True パラメータを設定すれば OK!

pandas.pydata.org

NDJSON サンプル dataset.jsonl

{ "id": 1, "name": "Alice" }
{ "id": 2, "name": "Bob" }
{ "id": 3, "name": "Kakakakakku", "blog":  "https://kakakakakku.hatenablog.com/" }

サンプルコード ndjson.py

import pandas as pd

df = pd.read_json('./dataset.jsonl', lines=True)
print(df)

実行すると期待通りに DataFrame を表示できた!

$ python ndjson.py
   id         name                                 blog
0   1        Alice                                  NaN
1   2          Bob                                  NaN
2   3  Kakakakakku  https://kakakakakku.hatenablog.com/