This talk will present architectural and optimization techniques that were used in the development of a H.264 software decoder (https://github.com/tvlabs/edge264), to drastically reduce code size and improve speed. The techniques are applicable to other block-based video codecs, and will be presented as HOWTOs to help participants use them in their own projects. They include code and memory layout with the C language, and maximizing opportunities for vectorization.