最近 MySQL で SQL の相談を受けたときに「もしかしたら GROUP_CONCAT
が使えるかも?」というアドバイスをした.GROUP BY
だけではなく GROUP_CONCAT
も知っておくと便利なので,今回は GROUP_CONCAT
の紹介記事を書く.
GROUP_CONCAT
とは?
GROUP_CONCAT
を使うと GROUP BY
で集約をしたレコードのデータを「連結した文字列として」返すことができる.
検証用データセット
今回は MySQL 5.7 で,MySQL から公式に公開されているデータセット「world database」を使って検証をする.
既に「world database」を含めた Docker イメージを公開しているので,今回は kakakakakku/mysql-world-database:5.7
を使う.
$ docker pull kakakakakku/mysql-world-database:5.7
$ docker run -e MYSQL_ALLOW_EMPTY_PASSWORD=yes -d kakakakakku/mysql-world-database:5.7
$ docker exec -it $(docker container ls | grep 'kakakakakku/mysql-world-database' | awk '{print $1}') /bin/sh
詳しくは以下の記事にまとめてある.
kakakakakku.hatenablog.com
GROUP_CONCAT
を試す
まず「日本の都市名」を都道府県ごとに集計し,連結した文字列として返す SQL を書く.
SELECT District, COUNT(Name), GROUP_CONCAT(Name)
FROM city
WHERE CountryCode = 'JPN'
GROUP BY District
ORDER BY COUNT(Name) DESC;
実行すると,以下のように「日本の都市名」をカンマ区切りで取得できる.データセットの問題で,予想以上に「日本の都市名」が少ないこともわかる.
mysql> SELECT District, COUNT(Name), GROUP_CONCAT(Name)
-> FROM city
-> WHERE CountryCode = 'JPN'
-> GROUP BY District
-> ORDER BY COUNT(Name) DESC;
+
| District | COUNT(Name) | GROUP_CONCAT(Name) |
+
| Osaka | 22 | Osaka,Sakai,Higashiosaka,Hirakata,Toyonaka,Takatsuki,Suita,Yao,Ibaraki,Neyagawa,Kishiwada,Izumi,Moriguchi,Kadoma,Matsubara,Daito,Minoo,Tondabayashi,Kawachinagano,Habikino,Ikeda,Izumisano |
| Saitama | 21 | Urawa,Kawaguchi,Omiya,Kawagoe,Tokorozawa,Koshigaya,Soka,Ageo,Kasukabe,Sayama,Kumagaya,Niiza,Iruma,Misato,Asaka,Iwatsuki,Toda,Fukaya,Sakado,Fujimi,Higashimatsuyama |
| Chiba | 19 | Chiba,Funabashi,Matsudo,Ichikawa,Kashiwa,Ichihara,Sakura,Yachiyo,Narashino,Nagareyama,Urayasu,Abiko,Kisarazu,Noda,Kamagaya,Nishio,Kimitsu,Mobara,Narita |
| Tokyo-to | 18 | Tokyo,Hachioji,Machida,Fuchu,Chofu,Kodaira,Mitaka,Hino,Tachikawa,Hitachinaka,Ome,Higashimurayama,Musashino,Higashikurume,Koganei,Kokubunji,Akishima,Hoya |
| Aichi | 15 | Nagoya,Toyohashi,Toyota,Okazaki,Kasugai,Ichinomiya,Anjo,Komaki,Seto,Kariya,Toyokawa,Handa,Tokai,Inazawa,Konan |
| Kanagawa | 15 | Jokohama [Yokohama],Kawasaki,Sagamihara,Yokosuka,Fujisawa,Hiratsuka,Chigasaki,Atsugi,Yamato,Odawara,Kamakura,Hadano,Zama,Ebina,Isehara |
| Hyogo | 11 | Kobe,Amagasaki,Himeji,Nishinomiya,Akashi,Kakogawa,Takarazuka,Itami,Kawanishi,Sanda,Takasago |
| Hokkaido | 10 | Sapporo,Asahikawa,Hakodate,Kushiro,Obihiro,Tomakomai,Otaru,Ebetsu,Kitami,Muroran |
| Shizuoka | 9 | Hamamatsu,Shizuoka,Shimizu,Fuji,Numazu,Fujieda,Fujinomiya,Yaizu,Mishima |
| Mie | 6 | Yokkaichi,Suzuka,Tsu,Matsusaka,Kuwana,Ise |
| Yamaguchi | 6 | Shimonoseki,Ube,Yamaguchi,Hofu,Tokuyama,Iwakuni |
| Fukuoka | 5 | Fukuoka,Kitakyushu,Kurume,Omuta,Kasuga |
| Gumma | 5 | Maebashi,Takasaki,Ota,Isesaki,Kiryu |
| Hiroshima | 5 | Hiroshima,Fukuyama,Kure,Higashihiroshima,Onomichi |
| Ibaragi | 5 | Mito,Hitachi,Tsukuba,Tama,Tsuchiura |
| Fukushima | 4 | Iwaki,Koriyama,Fukushima,Aizuwakamatsu |
| Gifu | 4 | Gifu,Ogaki,Kakamigahara,Tajimi |
| Kyoto | 4 | Kioto,Uji,Maizuru,Kameoka |
| Nagano | 4 | Nagano,Matsumoto,Ueda,Iida |
| Nara | 4 | Nara,Kashihara,Ikoma,Yamatokoriyama |
| Niigata | 4 | Niigata,Nagaoka,Joetsu,Kashiwazaki |
| Tochigi | 4 | Utsunomiya,Ashikaga,Oyama,Kanuma |
| Yamagata | 4 | Yamagata,Sakata,Tsuruoka,Yonezawa |
| Aomori | 3 | Aomori,Hachinohe,Hirosaki |
| Ehime | 3 | Matsuyama,Niihama,Imabari |
| Miyazaki | 3 | Miyazaki,Miyakonojo,Nobeoka |
| Nagasaki | 3 | Nagasaki,Sasebo,Isahaya |
| Okayama | 3 | Okayama,Kurashiki,Tsuyama |
| Okinawa | 3 | Naha,Okinawa,Urasoe |
| Shiga | 3 | Otsu,Kusatsu,Hikone |
| Ishikawa | 2 | Kanazawa,Komatsu |
| Kumamoto | 2 | Kumamoto,Yatsushiro |
| Miyagi | 2 | Sendai,Ishinomaki |
| Oita | 2 | Oita,Beppu |
| Tottori | 2 | Tottori,Yonago |
| Toyama | 2 | Toyama,Takaoka |
| Akita | 1 | Akita |
| Fukui | 1 | Fukui |
| Iwate | 1 | Morioka |
| Kagawa | 1 | Takamatsu |
| Kagoshima | 1 | Kagoshima |
| Kochi | 1 | Kochi |
| Saga | 1 | Saga |
| Shimane | 1 | Matsue |
| Tokushima | 1 | Tokushima |
| Wakayama | 1 | Wakayama |
| Yamanashi | 1 | Kofu |
+
47 rows in set (0.00 sec)
GROUP_CONCAT
+ ORDER BY
を試す
GROUP_CONCAT
は関数内部での ORDER BY
をサポートしているため,以下のように SQL を書くと連結した文字列を並び替えることができる.
SELECT District, COUNT(Name), GROUP_CONCAT(Name ORDER BY Name)
FROM city
WHERE CountryCode = 'JPN'
GROUP BY District
ORDER BY COUNT(Name) DESC;
今回は「日本の都市名」を昇順に並び替えている.
mysql> SELECT District, COUNT(Name), GROUP_CONCAT(Name ORDER BY Name)
-> FROM city
-> WHERE CountryCode = 'JPN'
-> GROUP BY District
-> ORDER BY COUNT(Name) DESC;
+
| District | COUNT(Name) | GROUP_CONCAT(Name ORDER BY Name) |
+
| Osaka | 22 | Daito,Habikino,Higashiosaka,Hirakata,Ibaraki,Ikeda,Izumi,Izumisano,Kadoma,Kawachinagano,Kishiwada,Matsubara,Minoo,Moriguchi,Neyagawa,Osaka,Sakai,Suita,Takatsuki,Tondabayashi,Toyonaka,Yao |
| Saitama | 21 | Ageo,Asaka,Fujimi,Fukaya,Higashimatsuyama,Iruma,Iwatsuki,Kasukabe,Kawagoe,Kawaguchi,Koshigaya,Kumagaya,Misato,Niiza,Omiya,Sakado,Sayama,Soka,Toda,Tokorozawa,Urawa |
| Chiba | 19 | Abiko,Chiba,Funabashi,Ichihara,Ichikawa,Kamagaya,Kashiwa,Kimitsu,Kisarazu,Matsudo,Mobara,Nagareyama,Narashino,Narita,Nishio,Noda,Sakura,Urayasu,Yachiyo |
| Tokyo-to | 18 | Akishima,Chofu,Fuchu,Hachioji,Higashikurume,Higashimurayama,Hino,Hitachinaka,Hoya,Kodaira,Koganei,Kokubunji,Machida,Mitaka,Musashino,Ome,Tachikawa,Tokyo |
| Aichi | 15 | Anjo,Handa,Ichinomiya,Inazawa,Kariya,Kasugai,Komaki,Konan,Nagoya,Okazaki,Seto,Tokai,Toyohashi,Toyokawa,Toyota |
| Kanagawa | 15 | Atsugi,Chigasaki,Ebina,Fujisawa,Hadano,Hiratsuka,Isehara,Jokohama [Yokohama],Kamakura,Kawasaki,Odawara,Sagamihara,Yamato,Yokosuka,Zama |
| Hyogo | 11 | Akashi,Amagasaki,Himeji,Itami,Kakogawa,Kawanishi,Kobe,Nishinomiya,Sanda,Takarazuka,Takasago |
| Hokkaido | 10 | Asahikawa,Ebetsu,Hakodate,Kitami,Kushiro,Muroran,Obihiro,Otaru,Sapporo,Tomakomai |
| Shizuoka | 9 | Fuji,Fujieda,Fujinomiya,Hamamatsu,Mishima,Numazu,Shimizu,Shizuoka,Yaizu |
| Mie | 6 | Ise,Kuwana,Matsusaka,Suzuka,Tsu,Yokkaichi |
| Yamaguchi | 6 | Hofu,Iwakuni,Shimonoseki,Tokuyama,Ube,Yamaguchi |
| Fukuoka | 5 | Fukuoka,Kasuga,Kitakyushu,Kurume,Omuta |
| Gumma | 5 | Isesaki,Kiryu,Maebashi,Ota,Takasaki |
| Hiroshima | 5 | Fukuyama,Higashihiroshima,Hiroshima,Kure,Onomichi |
| Ibaragi | 5 | Hitachi,Mito,Tama,Tsuchiura,Tsukuba |
| Fukushima | 4 | Aizuwakamatsu,Fukushima,Iwaki,Koriyama |
| Gifu | 4 | Gifu,Kakamigahara,Ogaki,Tajimi |
| Kyoto | 4 | Kameoka,Kioto,Maizuru,Uji |
| Nagano | 4 | Iida,Matsumoto,Nagano,Ueda |
| Nara | 4 | Ikoma,Kashihara,Nara,Yamatokoriyama |
| Niigata | 4 | Joetsu,Kashiwazaki,Nagaoka,Niigata |
| Tochigi | 4 | Ashikaga,Kanuma,Oyama,Utsunomiya |
| Yamagata | 4 | Sakata,Tsuruoka,Yamagata,Yonezawa |
| Aomori | 3 | Aomori,Hachinohe,Hirosaki |
| Ehime | 3 | Imabari,Matsuyama,Niihama |
| Miyazaki | 3 | Miyakonojo,Miyazaki,Nobeoka |
| Nagasaki | 3 | Isahaya,Nagasaki,Sasebo |
| Okayama | 3 | Kurashiki,Okayama,Tsuyama |
| Okinawa | 3 | Naha,Okinawa,Urasoe |
| Shiga | 3 | Hikone,Kusatsu,Otsu |
| Ishikawa | 2 | Kanazawa,Komatsu |
| Kumamoto | 2 | Kumamoto,Yatsushiro |
| Miyagi | 2 | Ishinomaki,Sendai |
| Oita | 2 | Beppu,Oita |
| Tottori | 2 | Tottori,Yonago |
| Toyama | 2 | Takaoka,Toyama |
| Akita | 1 | Akita |
| Fukui | 1 | Fukui |
| Iwate | 1 | Morioka |
| Kagawa | 1 | Takamatsu |
| Kagoshima | 1 | Kagoshima |
| Kochi | 1 | Kochi |
| Saga | 1 | Saga |
| Shimane | 1 | Matsue |
| Tokushima | 1 | Tokushima |
| Wakayama | 1 | Wakayama |
| Yamanashi | 1 | Kofu |
+
47 rows in set (0.00 sec)
GROUP_CONCAT
+ DISTINCT
を試す
次にデータの重複を考える.以下の SQL だと「地域名」が,例えば Caribbean,Caribbean,Caribbean
のように重複してしまう.
SELECT Continent, GROUP_CONCAT(Region)
FROM country
GROUP BY Continent;
GROUP_CONCAT
は関数内部での DISTINCT
をサポートしているため,以下のように SQL を書くと,データの重複を除外することができる.
SELECT Continent, GROUP_CONCAT(DISTINCT Region)
FROM country
GROUP BY Continent;
実際に「地域名」の重複を除外できている.
mysql> SELECT Continent, GROUP_CONCAT(DISTINCT Region)
-> FROM country
-> GROUP BY Continent;
+
| Continent | GROUP_CONCAT(DISTINCT Region) |
+
| Asia | Eastern Asia,Middle East,Southeast Asia,Southern and Central Asia |
| Europe | Baltic Countries,British Islands,Eastern Europe,Nordic Countries,Southern Europe,Western Europe |
| North America | Caribbean,Central America,North America |
| Africa | Central Africa,Eastern Africa,Northern Africa,Southern Africa,Western Africa |
| Oceania | Australia and New Zealand,Melanesia,Micronesia,Micronesia/Caribbean,Polynesia |
| Antarctica | Antarctica |
| South America | South America |
+
7 rows in set (0.00 sec)
GROUP_CONCAT
+ DISTINCT
+ SEPARATOR
を試す
GROUP_CONCAT
のデフォルト区切り文字は ,
になっている.データに ,
が入っている可能性がある場合など,意図的に区切り文字を変更する場合は SEPARATOR
を設定する.
SELECT Continent, GROUP_CONCAT(DISTINCT Region SEPARATOR ' ? ')
FROM country
GROUP BY Continent;
以下は区切り文字を ?
にして SQL を実行している.
mysql> SELECT Continent, GROUP_CONCAT(DISTINCT Region SEPARATOR ' ? ')
-> FROM country
-> GROUP BY Continent;
+
| Continent | GROUP_CONCAT(DISTINCT Region SEPARATOR ' ? ') |
+
| Asia | Eastern Asia ? Middle East ? Southeast Asia ? Southern and Central Asia |
| Europe | Baltic Countries ? British Islands ? Eastern Europe ? Nordic Countries ? Southern Europe ? Western Europe |
| North America | Caribbean ? Central America ? North America |
| Africa | Central Africa ? Eastern Africa ? Northern Africa ? Southern Africa ? Western Africa |
| Oceania | Australia and New Zealand ? Melanesia ? Micronesia ? Micronesia/Caribbean ? Polynesia |
| Antarctica | Antarctica |
| South America | South America |
+
7 rows in set (0.00 sec)
group_concat_max_len
システム変数
MySQL の group_concat_max_len
システム変数は GROUP_CONCAT
が返す連結した文字列の最大バイト数を意味している.デフォルトは「1024」で,最小値は「4」を設定することができる.
実際に SHOW VARIABLES
で確認すると「1024」になっていることを確認できる.
mysql> SHOW VARIABLES LIKE 'group_concat_max_len';
+
| Variable_name | Value |
+
| group_concat_max_len | 1024 |
+
1 row in set (0.01 sec)
今回は意図的に最小値「4」に設定すると,GROUP_CONCAT
の連結した文字列を「4」バイトに制限できる.
mysql> SET SESSION group_concat_max_len = 4;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT Continent, GROUP_CONCAT(DISTINCT Region)
-> FROM country
-> GROUP BY Continent;
+
| Continent | GROUP_CONCAT(DISTINCT Region) |
+
| Asia | East |
| Europe | Balt |
| North America | Cari |
| Africa | Cent |
| Oceania | Aust |
| Antarctica | Anta |
| South America | Sout |
+
7 rows in set, 7 warnings (0.00 sec)
まとめ
- MySQL を使う場合
GROUP BY
だけではなく GROUP_CONCAT
も知っておくと便利
GROUP_CONCAT
は関数内部で ORDER BY
や DISTINCT
や SEPARATOR
をサポートしている
GROUP_CONCAT
の結果列のバイト数を調整する場合には,MySQL のシステム変数 group_concat_max_len
を設定する