PythonでSQLite3のベンチマークテスト

SQLiteの性能を知りたかったので、ざっくりと調べてみました。Pythonを使ってSQLite3のベンチマークテストを行っていきます。何かの参考になれば幸いです。

PythonでSQLite3を操作する
1. テーブルの作成
1. データの挿入
1. データの選択
PythonでSQLite3のベンチマークテスト
1. プログラム実行時間の計測
1. ベンチマークプログラム
1. 実験結果
SQLite3のバイナリ

独学プログラマー Python言語の基本から仕事のやり方まで

Amazonで見る Kindleで見る楽天市場で見る

PythonでSQLite3を操作する

まずはPythonでSQLite3を操作するために、簡単な使い方を説明しておく。

テーブルの作成

name	type
id	integer
name	text

たとえば、上のテーブルを users として作りたい場合、次のように書くことができる。

with sqlite3.connect("test.sqlite") as conn:
    c = conn.cursor()
    c.execute('create table users(id integer, name text)')

データの挿入

with sqlite3.connect("test.sqlite") as conn:
    c = conn.cursor()
    c.execute(
        "insert into users values (1, 'hogehoge')")

データの選択

with sqlite3.connect("test.sqlite") as conn:
    c = conn.cursor()
    c.execute(
        "select * from users where id=1")

PythonでSQLite3のベンチマークテスト

1000から100万レーコードまでを10の階乗ごとに比較するため、ベンチマークテストを次のようにして行った。

レコード数だけinsertするのにかかった時間
その後、一行だけinsertする時にかかった時間
データを一行selectするのにかかった時間

プログラム実行時間の計測

ベンチマークテストの際に、プログラムの実行時間を計測できるように、次のクラスを定義しておく。

import time

class Timer(object):

    def __enter__(self):
        self.start = time.time()
        return self

    def __exit__(self, *args):
        self.end = time.time()
        self.secs = self.end - self.start
        print('  => elapsed time: %f s' % self.secs)

ベンチマークプログラム

実際使ったベンチマークテストのプログラムがこちら。

# -*- coding: utf-8 -*-

import sqlite3
from timer import Timer

def dbname(n):
    return 'test_{n}.sqlite'.format(n=n)

def create_db(db):
    with sqlite3.connect(db) as conn:
        c = conn.cursor()
        c.execute('create table users(id integer, name text)')

def test_insert_many_datas(db, p):
    with sqlite3.connect(db) as conn: # withは自動的にcloseされる
        c = conn.cursor()
        for i in range(pow(10, p)):
            c.execute(
                "insert into users values ({}, 'hoge_{}')".format(i, i))

def test_insert_a_data(db):
    with sqlite3.connect(db) as conn:
        c = conn.cursor()
        c.execute(
            "insert into users values ({}, 'hoge_{}')".format(-1, -1))

def test_select_a_data(db):
    with sqlite3.connect(db) as conn:
        c = conn.cursor()
        c.execute(
            "select * from users where id={}".format(50))

if __name__ == "__main__":

    r = range(3, 8) # [3, 4, 5, 6, 7]

    for n in r:
        db = dbname(n)
        create_db(db)
        print("Doing insert 10^{} datas to {} ...".format(n, db))
        with Timer() as t:
            test_insert_many_datas(db, n)

    for n in r:
        db = dbname(n)
        print("Doing insert a data to {} ...".format(db))
        with Timer() as t:
            test_insert_a_data(db)

    for n in r:
        db = dbname(n)
        print("Doing select a data from {} ...".format(db))
        with Timer() as t:
            test_select_a_data(db)

実験結果

実験結果は次のとおり。

レコード数	ファイルサイズ	❶	❷	❸
10^3	28K	0.012690 s	0.002479	0.000331 s
10^4	196K	0.085659 s	0.001758 s	0.000503 s
10^5	2.1M	0.906912 s	0.001722 s	0.000590 s
10^6	22M	8.376609 s	0.001798 s	0.000532 s
10^7	239M	86.897375 s	0.001175 s	0.000299 s

レコード数だけinsertするのにかかった時間
その後、一行だけinsertする時にかかった時間
データを一行selectするのにかかった時間

10万件のレコードをinsertするのに8.3秒、100万件だと86秒かかっている。単純にレコード数に比例して時間がかかっているようだ。 1000万件は800秒ほどになると予想できる。

また、ファイルサイズも単純にレコード数に比例して大きくなっている様子。

1行だけをselectしたり、insertする分にはO^nの指数関数的にはならなかった。 さすがはデータベース、よくできている。ハッシュで管理しているのだろう。

SQLite3のバイナリ

作成したデータベースのバイナリを od -c test_2.sqlite で覗いてみるとなかなかおもしろい。こんな感じになっている。

0000000    S   Q   L   i   t   e       f   o   r   m   a   t       3  \0
0000020  020  \0 001 001  \0   @          \0  \0  \0 002  \0  \0  \0 002
0000040   \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0 001  \0  \0  \0 004
0000060   \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0 001  \0  \0  \0  \0
0000100   \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0000120   \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0 002
0000140   \0   .   0   :  \r  \0  \0  \0 001 017 277  \0 017 277  \0  \0
0000160   \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
0007660   \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0   ?
0007700  001 006 027 027 027 001   _   t   a   b   l   e   u   s   e   r
0007720    s   u   s   e   r   s 002   C   R   E   A   T   E       T   A
0007740    B   L   E       u   s   e   r   s   (   i   d       i   n   t
0007760    e   g   e   r   ,       n   a   m   e       t   e   x   t   )
0010000   \r  \0  \0  \0   d  \n 370  \0 017 365 017 352 017 336 017 017
0010020  017 306 017 272 017 256 017 242 017 226 017 212 017   } 017   p
0010040  017   c 017   V 017   I 017   < 017   / 017   " 017 025 017  \b
0010060  016 373 016 356 016 341 016 324 016 307 016 272 016 255 016 240
0010100  016 223 016 206 016   y 016   l 016   _ 016   R 016   E 016   8
0010120  016   + 016 036 016 021 016 004  \r 367  \r 352  \r 335  \r 015
0010140   \r 303  \r 266  \r 251  \r 234  \r 217  \r 202  \r   u  \r   h
0010160   \r   [  \r   N  \r   A  \r   4  \r   '  \r 032  \r  \r  \r  \0
0010200   \f 363  \f 346  \f 331  \f 314  \f 277  \f 262  \f 245  \f 230
0010220   \f 213  \f   ~  \f   q  \f   d  \f   W  \f   J  \f   =  \f   0
0010240   \f   #  \f 026  \f  \t  \v 374  \v 357  \v 342  \v 325  \v 013
0010260   \v 273  \v 256  \v 241  \v 224  \v 207  \v   z  \v   m  \v   `
0010300   \v   S  \v   F  \v   9  \v   ,  \v 037  \v 022  \v 005  \n 370

スポンサー広告