kakakakakku blog

Weekly Tech Blog: Keep on Learning!

SQL で JSON ファイルをお手軽に検索できる Columnq

SQL で JSON ファイルを検索したいなー!でも大袈裟な分析環境は構築したくないなー!って思うときってあるよね❓きっとあるはず〜あるある!今回はお手軽に JSON ファイルを SQL で検索できる「Columnq」を紹介する❗️Columnq は JSON 以外に CSV や Parquet もサポートしている.

github.com

インストール

pip で簡単にインストールできる.今回は Columnq 0.3.0 を使う.

$ pip install columnq-cli

$ columnq --version
Columnq 0.3.0

データセット

今回は JSON ファイルのサンプルデータセットとして,GitHub のエクスポートデータを使う.詳しくは以下の記事を参照してもらえればと!

kakakakakku.hatenablog.com

Columnq : 基本操作

🔎 SELECT ① : WHERE

columnq コマンドでは --table オプションに JSON ファイルを指定すると,SQL でテーブル名として使えるようになる.以下の SQL は GitHub のリポジトリ情報を含んだ repositories_000004.json に対して SELECT 文を実行している.

$ columnq sql --table repositories_000004.json \
'SELECT name, owner, default_branch, has_issues, general_settings.projects AS projects, created_at
FROM repositories_000004
WHERE private = false AND is_archived = false
LIMIT 5'
+--------------------------------------------+--------------------------------+----------------+------------+----------+----------------------+
| name                                       | owner                          | default_branch | has_issues | projects | created_at           |
+--------------------------------------------+--------------------------------+----------------+------------+----------+----------------------+
| Kubernetes-and-Cloud-Native-Associate-KCNA | https://github.com/kakakakakku | main           | false      | true     | 2022-05-31T05:04:38Z |
| pluto                                      | https://github.com/kakakakakku | master         | false      | true     | 2022-07-18T14:26:25Z |
| killercoda-kakakakakku                     | https://github.com/kakakakakku | master         | true       | false    | 2022-07-27T08:31:50Z |
| sandbox-playwright-pytest-tracing          | https://github.com/kakakakakku | master         | false      | true     | 2022-11-30T04:09:18Z |
+--------------------------------------------+--------------------------------+----------------+------------+----------+----------------------+

🔎 SELECT ② : GROUP BY

以下の SQL は GitHub のリポジトリ情報を含んだ repositories_000003.json に対して SELECT 文を実行している.GROUP BY もちゃんと使える.

$ columnq sql --table repositories_000003.json \
'SELECT private, COUNT(*) AS count
FROM repositories_000003
GROUP BY private'
+---------+-------+
| private | count |
+---------+-------+
| true    | 69    |
| false   | 31    |
+---------+-------+

🔎 SELECT ③ : JOIN

以下の SQL は GitHub のリポジトリ情報を含んだ repositories_000003.json と issue 情報を含んだ issues_000001.json に対して SELECT 文を実行している.複数の JSON ファイルに JOIN を使えるのは本当に便利❗️

$ columnq sql --table repositories_000003.json --table issues_000001.json \
'SELECT name, issues_000001.url, private
FROM repositories_000003
JOIN issues_000001
ON repositories_000003.url = issues_000001.repository'
+----------------------+--------------------------------------------------------------+---------+
| name                 | url                                                          | private |
+----------------------+--------------------------------------------------------------+---------+
| tmp                  | https://github.com/kakakakakku/tmp/issues/1                  | true    |
| TIL                  | https://github.com/kakakakakku/TIL/issues/1                  | true    |
| TIL                  | https://github.com/kakakakakku/TIL/issues/2                  | true    |
| TIL                  | https://github.com/kakakakakku/TIL/issues/3                  | true    |
| TIL                  | https://github.com/kakakakakku/TIL/issues/4                  | true    |
| TIL                  | https://github.com/kakakakakku/TIL/issues/5                  | true    |
| TIL                  | https://github.com/kakakakakku/TIL/issues/6                  | true    |
| TIL                  | https://github.com/kakakakakku/TIL/issues/7                  | true    |
| TIL                  | https://github.com/kakakakakku/TIL/issues/8                  | true    |
| hello-github-actions | https://github.com/kakakakakku/hello-github-actions/issues/1 | true    |
| hello-github-actions | https://github.com/kakakakakku/hello-github-actions/issues/3 | true    |
+----------------------+--------------------------------------------------------------+---------+

Columnq : 複数ファイルを横断して検索する

columnq コマンドでは --table オプションに JSON ファイルではなく「標準入力 (stdin)」を指定することもできる.例えば GitHub のリポジトリ情報を含んだ JSON ファイルが repositories_000001.json から repositories_000004.json に分割されている場合は cat コマンドや jq コマンドを使ってファイル結合をして,標準入力 (stdin) に対して SELECT 文を実行できる.

$ jq -s add repositories_*.json | \
columnq sql --table 't=stdin,format=json' \
'SELECT private, COUNT(*) AS count
FROM t
GROUP BY private'
+---------+-------+
| private | count |
+---------+-------+
| true    | 114   |
| false   | 199   |
+---------+-------+

Columnq : インタラクティブモード

columnq console コマンドを使うとインタラクティブに SQL を実行できる.

$ columnq console --table repositories_000004.json

columnq(sql)> SHOW TABLES;
+---------------+--------------------+---------------------+------------+
| table_catalog | table_schema       | table_name          | table_type |
+---------------+--------------------+---------------------+------------+
| datafusion    | public             | repositories_000004 | BASE TABLE |
| datafusion    | information_schema | tables              | VIEW       |
| datafusion    | information_schema | views               | VIEW       |
| datafusion    | information_schema | columns             | VIEW       |
+---------------+--------------------+---------------------+------------+

columnq(sql)> SHOW COLUMNS FROM repositories_000004;
+---------------+--------------+---------------------+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
| table_catalog | table_schema | table_name          | column_name              | data_type                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | is_nullable |
+---------------+--------------+---------------------+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
| datafusion    | public       | repositories_000004 | actions_general_settings | Struct([Field { name: "actions_disabled", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "allows_all_actions", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "allows_github_owned_actions", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "allows_local_actions_only", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "allows_specific_actions_patterns", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "allows_verified_actions", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "patterns", data_type: List(Field { name: "item", data_type: Null, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }])                                                                                                                                                                                                                                                                                                                                                                                                                                                   | YES         |
| datafusion    | public       | repositories_000004 | autolinks                | List(Field { name: "item", data_type: Null, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None })                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | YES         |
| datafusion    | public       | repositories_000004 | collaborators            | List(Field { name: "item", data_type: Struct([Field { name: "permission", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "user", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }]), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None })                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | YES         |
| datafusion    | public       | repositories_000004 | created_at               | Utf8                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | YES         |
| datafusion    | public       | repositories_000004 | default_branch           | Utf8                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | YES         |
| datafusion    | public       | repositories_000004 | description              | Utf8                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | YES         |
| datafusion    | public       | repositories_000004 | general_settings         | Struct([Field { name: "allow_forking", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "auto_merge", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "delete_branch_heads", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "discussions", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "git_lfs_in_archives", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "merge_commit", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "projects", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "rebase_merge", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "sponsorships", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "squash_merge", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "template", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "update_branch", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }]) | YES         |
| datafusion    | public       | repositories_000004 | git_url                  | Utf8                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | YES         |
| datafusion    | public       | repositories_000004 | has_downloads            | Boolean                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | YES         |
| datafusion    | public       | repositories_000004 | has_issues               | Boolean                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | YES         |
| datafusion    | public       | repositories_000004 | has_wiki                 | Boolean                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | YES         |
| datafusion    | public       | repositories_000004 | is_archived              | Boolean                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | YES         |
| datafusion    | public       | repositories_000004 | labels                   | List(Field { name: "item", data_type: Struct([Field { name: "color", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "created_at", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "description", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "name", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "url", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }]), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None })                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | YES         |
| datafusion    | public       | repositories_000004 | name                     | Utf8                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | YES         |
| datafusion    | public       | repositories_000004 | owner                    | Utf8                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | YES         |
| datafusion    | public       | repositories_000004 | private                  | Boolean                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | YES         |
| datafusion    | public       | repositories_000004 | public_keys              | List(Field { name: "item", data_type: Struct([Field { name: "created_at", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "fingerprint", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "key", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "read_only", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "title", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }]), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None })                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | YES         |
| datafusion    | public       | repositories_000004 | repository_topics        | List(Field { name: "item", data_type: Null, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None })                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | YES         |
| datafusion    | public       | repositories_000004 | security_and_analysis    | Struct([Field { name: "advanced_security", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "dependency_graph", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "token_scanning", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "token_scanning_push_protection", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "vulnerability_alerts", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "vulnerability_updates", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }])                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | YES         |
| datafusion    | public       | repositories_000004 | type                     | Utf8                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | YES         |
| datafusion    | public       | repositories_000004 | url                      | Utf8                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | YES         |
| datafusion    | public       | repositories_000004 | webhooks                 | List(Field { name: "item", data_type: Struct([Field { name: "active", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "content_type", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "enable_ssl_verification", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "event_types", data_type: List(Field { name: "item", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "payload_url", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }]), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None })                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | YES         |
| datafusion    | public       | repositories_000004 | website                  | Utf8                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | YES         |
+---------------+--------------+---------------------+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+

columnq(sql)> SELECT name, owner, default_branch, has_issues, general_settings.projects AS projects, created_at
FROM repositories_000004
WHERE private = false AND is_archived = false
LIMIT 5;
+--------------------------------------------+--------------------------------+----------------+------------+----------+----------------------+
| name                                       | owner                          | default_branch | has_issues | projects | created_at           |
+--------------------------------------------+--------------------------------+----------------+------------+----------+----------------------+
| Kubernetes-and-Cloud-Native-Associate-KCNA | https://github.com/kakakakakku | main           | false      | true     | 2022-05-31T05:04:38Z |
| pluto                                      | https://github.com/kakakakakku | master         | false      | true     | 2022-07-18T14:26:25Z |
| killercoda-kakakakakku                     | https://github.com/kakakakakku | master         | true       | false    | 2022-07-27T08:31:50Z |
| sandbox-playwright-pytest-tracing          | https://github.com/kakakakakku | master         | false      | true     | 2022-11-30T04:09:18Z |
+--------------------------------------------+--------------------------------+----------------+------------+----------+----------------------+

まとめ

SQL で JSON ファイルをお手軽に検索するなら Columnq が便利❗️