I just wanted to populate a sqlite database with 100 million rows of data from a 1.8 gigabyte tab delimited file (tsv).
PRAGMA cache_size = 400000 lets sqlite allocate about 460MB of RAM to work with. You can find more detailed explaination of these PRAGMA options on the Sqlite site.
Using this approach, the inserting of data took only about 20 mins.
However, adding a composite primary key (i.e. unique index) and a non-unique composite index ran for 70+ hours before I just canceled it.
If you can pre-sort your data in the order it is indexed, the import speed is practically the same whether you create the index before or after the import. However, if you create the index before the import, the database data and indexes will be intermingled with each other, thus fragmented (but this won’t affect the file size). The fragmentation can be corrected by running the VACUUM command.
There is a powerful ExternalSort implementation as part of the Java Small Text library. Look under Documentation→Small Text Utils tutorial for info about it.
0 comments:
Post a Comment