How to store 60 Booleans in a MySQL Database?

Posted on

How to store 60 Booleans in a MySQL Database? – Here in this article, we will share some of the most common and frequently asked about PHP problem in programming with detailed answers and code samples. There’s nothing quite so frustrating as being faced with PHP errors and being unable to figure out what is preventing your website from functioning as it should like php and mysql . If you have an existing PHP-based website or application that is experiencing performance issues, let’s get thinking about How to store 60 Booleans in a MySQL Database?.

I’m building a mobile App I use PHP & MySQL to write a backend – REST API.

If I have to store around 50-60 Boolean values in a table called “Reports”(users have to check things in a form) in my mobile app I store the values (0/1) in a simple array. In my MySql Table should I create a different column for each Boolean value or is it enough if I simply use a string or an Int to store it as a “number” like “110101110110111…”?

I get and put the data with JSON.

UPDATE 1: All I have to do is check if everything is 1, if one of them is 0 then that’s a “problem”. In 2 years this table will have around 15.000-20.000 rows, it has to be very fast and as space-saving as possible.

UPDATE 2: In terms of speed which solution is faster? Making separate columns vs store it in a string/binary type. What if I have to check which ones are the 0s? Is it a great solution if I store it as a “number” in one column and if it’s not “111..111” then send it to the mobile app as JSON where I parse the value and analyse it on the user’s device? Let’s say I have to deal with 50K rows.

Thanks in advance.

Solution :

A separate column per value is more flexible when it comes to searching.

A separate key/value table is more flexible if different rows have different collections of Boolean values.

And, if

  1. your list of Boolean values is more-or-less static
  2. all your rows have all those Boolean values
  3. your performance-critical search is to find rows in which any of the values are false

then using text strings like ‘1001010010’ etc is a good way to store them. You can search like this

 WHERE flags <> '11111111'

to find the rows you need.

You could use a BINARY column with one bit per flag. But your table will be easier to use for casual queries and eyeball inspection if you use text. The space savings from using BINARY instead of CHAR won’t be significant until you start storing many millions of rows.

edit It has to be said: every time I’ve built something like this with arrays of Boolean attributes, I’ve later been disappointed at how inflexible it turned out to be. For example, suppose it was a catalog of light bulbs. At the turn of the millennium, the Boolean flags might have been stuff like

screw base
halogen
mercury vapor
low voltage

Then, things change and I find myself needing more Boolean flags, like,

LED
CFL 
dimmable
Energy Star

etc. All of a sudden my data types aren’t big enough to hold what I need them to hold. When I wrote “your list of Boolean values is more-or-less static” I meant that you don’t reasonably expect to have something like the light-bulb characteristics change during the lifetime of your application.

So, a separate table of attributes might be a better solution. It would have these columns:

   item_id           fk to item table         -- pk
   attribute_id      attribute identifier     -- pk
   attribute_value   

This is ultimately flexible. You can just add new flags. You can add them to existing items, or to new items, at any time in the lifetime of your application. And, every item doesn’t need the same collection of flags. You can write the “what items have any false attributes?” query like this:

 SELECT DISTINCT item_id FROM attribute_table WHERE attribute_value = 0

But, you have to be careful because the query “what items have missing attributes” is a lot harder to write.

For your specific purpose, when any zero-flag is a problen (an exception) and most of entries (like 99%) will be “1111…1111”, i dont see any reason to store them all. I would rather create a separate table that only stores unchecked flags. The table could look like: uncheked_flags (user_id, flag_id). In an other table you store your flag definitions: flags (flag_id, flag_name, flag_description).

Then your report is as simple as SELECT * FROM unchecked_flags.

Update – possible table definitions:

CREATE TABLE `flags` (
    `flag_id` TINYINT(3) UNSIGNED NOT NULL AUTO_INCREMENT,
    `flag_name` VARCHAR(63) NOT NULL,
    `flag_description` TEXT NOT NULL,
    PRIMARY KEY (`flag_id`),
    UNIQUE INDEX `flag_name` (`flag_name`)
) ENGINE=InnoDB;

CREATE TABLE `uncheked_flags` (
    `user_id` MEDIUMINT(8) UNSIGNED NOT NULL,
    `flag_id` TINYINT(3) UNSIGNED NOT NULL,
    PRIMARY KEY (`user_id`, `flag_id`),
    INDEX `flag_id` (`flag_id`),
    CONSTRAINT `FK_uncheked_flags_flags` FOREIGN KEY (`flag_id`) REFERENCES `flags` (`flag_id`),
    CONSTRAINT `FK_uncheked_flags_users` FOREIGN KEY (`user_id`) REFERENCES `users` (`user_id`)
) ENGINE=InnoDB;

You may get a better search out of using dedicated columns, for each boolean, but the cardinality is poor and even if you index each column it will involve a fair bit of traversal or scanning.

If you are just looking for HIGH-VALUES 0xFFF…. then definitely bitmap, this solves your cardinality problem (per OP update). It’s not like you are checking parity… The tree will however be heavily skewed to HIGH-VALUES if this is normal and can create a hot spot prone to node splitting upon inserts.

Bit mapping and using bitwise operator masks will save space but will need to be aligned to a byte so there may be an unused “tip” (provisioning for future fields perhaps), so the mask must be of a maintained length or the field padded with 1s.

It will also add complexity to your architecture, that may require bespoke coding, bespoke standards.

You need to perform an analysis on the importance of any searching (you may not ordinarily expect to be searching all. or even any of the discrete fields).

This is a very common strategy for denormalising data and also for tuning service request for specific clients. (Where some reponses are fatter than others for the same transaction).

Case 1: If “problems” are rare.

Have a table Problems with ids, and a TINYINT with the value (50-60) of the problem. With suitable indexes on that table you can lookup whatever you need.

Case 2: Lots of items.

Use a BIGINT UNSIGNED to hold up to 64 0/1 value. Use an expression like 1 << n to build a mask for the nth (counting from 0) bit. If you know, for example, that there exactly 55 bits, then the value of all 1s is (1<<55)-1. Then you can find the items with “problems” via WHERE bits = (1<<55)-1.

Bit Operators and functions

Case 3: You have names for the problems.

SET ('broken', 'stolen', 'out of gas', 'wrong color', ...)

That will build a DATATYPE with (logically) a bit for each problem. See also the function FIND_IN_SET() as a way to check for one problem.

Cases 2 and 3 will take about 8 bytes for the full set of problems — very compact. Most SELECT that you might perform would scan the entire table, but 20K rows won’t take terribly long and will be a lot faster than having 60 columns or a row per problem.

Leave a Reply

Your email address will not be published. Required fields are marked *