Mysql field types and their charsets

When having to administer a big database having tables that contains millions  of rows, a big issue emerges. Optimizing the tables and the data stored.

A short version would tempt me to say: Don’t use the utf8 encoding for all the table. Use it for the damn fields where you really need it.

Longer version would require some calculus to be performed, and that would lead to a better understanding of what happens in a MySQL server.

When varchar is it used, the storage engine, allocate exactly the amount of bytes required to store the value, no matter the charset. However during the creation of the temporary tables and internal buffers (during joins, ordering, basic reading from the storage and other operations), the maximum amount of bytes for the column is allocated.

This means is you have a column that is Varchar(255) utf8, and you write a single letter in it, the disk will need just 2 bytes  to store it: 1 byte for its length and 1 byte to store the value. The issue becomes trickier, when you need to fetch that value, as MySQL not knowing what it is stored in that field, will allocate 768 bytes (you have read right 768… 256 bytes the length of the field * 3 bytes per utf8 character).

Assume you have a table that contains 2 columns of type varchar(255) utf8 and you might run a query like this:

SELECT column1, column2 FROM table1 ORDER BY column1 DESC;

If you don’t have index on column1 and the table has 1 Million rows, MySQL would require to create a temporary table to do the sorting, which would lead to a temporary table of something like 1.43 Gb. The calculus is symple:

768 (Bytes) *  2 (columns) * 1.000.000 (rows) = 1536000000 Bytes  which would give a roughly 1.43 Gigabytes of temporary table on disk.

If you have values that are actually that long, it makes sense, but if the maximum length is around 30 – 50 (let’s consider a username or a password or even an email field) the temporary table would be roughly around 290 Megabytes

150 (Bytes) * 2 (columns) * 1000000 (records) = 300000000 Bytes which is a give a roughly 290 Megabytes.

It is obvious that 290 Megabytes are faster to read than a 1.43 Gigabytes.

Of course this could be pushed a little bit more, and if you don’t need utf8 encoding, you could  switch to latin1 which would mean around 95 Megabytes.

The above stuff is valid even you have a table of 20 Megabytes in size on disk.

Lessons that i am trying to express:

a) use the charset you need!
b) use the length of varchar in a responsible matter, don’t make it just “as much as possible”

Read more about: MySQL Storage Requirements

This post is written with help from my friend: Rene Cannao

How to implement Enum Fields in your Ruby on Rails models

If you reached this page, it might be because you have searched how the hell you could create an enum field in Ruby On Rails.

Short answer: Ruby On Rails prior to version 4 does not know how to deal Enum fields, so you cannot declare enum fields. However, you can hack your model, and implement your own enum 🙂

Code your own Enum Field

Long answer: Many developers need for a reason or another to have enum fields in their database. Maybe because they need to save a status of their object, or they need something more complex. By default Rails allows you to do this by using string fields, which later you could use to fetch your information by using scopes, or custom ActiveRecord or by declaring your own methods.

Personally i needed to have some enum fields in my Ruby on Rails applications, and i have seen that i am usually repeat the same stuff all over again, thing that lead me to write the snippet at the end of this post which later allowed me to use plain vanilla Enum fields into my Database.

First is i am creating my migration or my model where i add something like this:

class CreateProducts < ActiveRecord::Migration  
  def up
    create_table :products do |t|
       t.string :name
       t.integer :my_status_field, :limit => 1  #as a TinyInt 
    add_index :products, :my_status_field

  def down
    drop_table :products

After i am creating the needed migration and the model, we are going to implement our status column inside a model.

class Product < ActiveRecord::Base
    :pending             => 1,
    :open                => 2, 
    :closed              => 3, 
    :rejected            => 4, 
    :waiting_for_payment => 5 

  has_enum_field :my_status_field, STATUS_ARRAY

By using a snippet like the one above, you can easily use this kind of syntax:

# Active Record Scopes:
pending_products = Product.pending
open_products    =
# etc 

# inside an object, the following syntax

object = 
object.my_status_field = :open 

object.is_pending?  # => false 

object.my_status_field = STATUS_ARRAY[:closed]
object.pending?    # => false 
object.closed?     # => true
object.is_closed?  # => true

But, wait, there is more:
1. you have presence validators
2. you can disable the number 1 validators
3. you can disable the boolean columns
4. you can diable the scopes

We are gonna take the above product class, and we will add another hash to our customize our enum snippet, by disabling the validators and scopes.

class Product < ActiveRecord::Base
    :pending             => 1,
    :open                => 2, 
    :closed              => 3, 
    :rejected            => 4, 
    :vaiting_for_payment => 5 

    :validate => false, 
    :scopes => false, 
    :booleans => true

  has_enum_field :my_status_field, STATUS_ARRAY, ENUM_SETTINGS 

Before running the below example, you might want to add in your “environment.rb” or “application.rb” a require statement to include the module globbaly into your project

Finally the module:

module EnumField
  class << self
    def included(klass)
      klass.class_eval do
        extend ClassMethods
        include InstanceMethods

  module InstanceMethods

  module ClassMethods
    def has_enum_field(column_name, data_set, options = {:validate => true, :scopes => true, :booleans => true})
      data_set.keys.each do |ds|
        dat = data_set[ds.to_sym]
        class_eval %{
          validates_inclusion_of :#{column_name}, :in => #{data_set}.keys  
        } if options[:validate] 

        class_eval %{
          scope :#{ds}, where('#{column_name} = ?', dat)
        } if options[:scopes] 

        class_eval %{
          def #{ds}?
            self[:#{column_name}] == #{data_set}[:#{ds}]

          alias_method :is_#{ds}?, :#{ds}?
        } if options[:booleans] 

        class_eval %{
          def #{column_name}=(value)
            self[:#{column_name}] = value.is_a?(Integer) ? value : #{data_set}[value.to_sym]

          def #{column_name}

ActiveRecord::Base.send(:include, EnumField)

This module might have a problem thought, because is not 100% bullet proof. In order to achieve that, you might need to override the write_attribute method from Rails.


Once with release of Rails 4, this post can be deprecated, as it has been introduced by default in Rails. Please read more on the official wiki: