Introduction
You've probably thought a lot about ways you can optimize your Web site. Since the release of this new site, I've been thinking about it a lot. In this post I would like to introduce an advanced version of the CSS minify algorithm. There are several existing implementations available, but I took the one integrated into BlogEngine.NET, as a reference. Compared to it, my algorithm offers several benefits:
Concept
As opposed to the built-in CSS minify algorithm which uses regular expressions, the current algorithm is more like a state machine. This is why you might think it's harder to read and debug. This is why I am writing this post.
First, here are valid CSS states that the algorithm uses:
enum CssState
{ Punctuation, Token, StringD, StringS }
The CSS code is always considered Punctuation
if a different state does not apply. Curly braces, square brackets, parentheses – this is all Punctuation
. The Token
state includes all tag names, element IDs and classes, properties and values – anything that consists of alpha-numeric characters plus a limited number of auxiliary characters – .#_-@*%()
. And finally, there are two string states. StringD
represents a double-quoted string, StringS
– single-quoted. I'll cover more about the reason why the latter two are required, in the Handling Strings section.
The algorithm reads the input character by character and determines if a sequence of whitespace characters can be removed from it. It also handles comments pretty much the same way. Here is the code illustrating the idea:
public static string Minify(string theCss)
{
// Assume that the length of the output string
// will be at most 75% the length of the incoming one to avoid
// additional StringBuilder reallocations.
StringBuilder aRet = new StringBuilder(theCss.Length * 3 / 4);
int aNumChars = theCss.Length;
CssState aPrevState = CssState.Punctuation;
int aPrevPos = 0;
int i = 0;
while(i < aNumChars)
{
CssState aCurState = GetState(theCss, ref i, aPrevState);
if(i > aPrevPos + 1)
{
// If whitespace is found between two tokens, keep it compact
if (aPrevState != CssState.Punctuation && aCurState != CssState.Punctuation)
aRet.Append(' ');
// Otherwise, no whitespace is needed, skip everything between aPrevPos and i
}
aPrevPos = i;
aPrevState = aCurState;
aRet.Append(theCss[i++]);
}
return aRet.ToString();
}
As you can see, whitespace must not be removed is when it delimits token or string characters. Examples of this case are:
border-bottom: solid 2px #9f4c1f;
background-position: center top;
quotes: '« ' ' »';
quotes: "»" "«" "\2039" "\203A";
In all other cases, the whitespace can be trimmed. Again, there is a special case when whitespace is found inside a string, but we'll cover it in the next section.
The key of the algorithm is the GetState
method. It skips any comments and whitespace characters found in the incoming string, updates the position variable i
to a value represented by a meaningful state:
static CssState GetState(string theCss, ref int thePos, CssState theCurState)
{
int aLen = theCss.Length;
int i = thePos;
if (theCurState == CssState.StringD)
{
//REMOVED FOR COMPACTNESS
}
else if (theCurState == CssState.StringS)
{
//REMOVED FOR COMPACTNESS
}
bool aSkip = true;
while(aSkip)
{
/////////////////////////////////////////
// Skip whitespace
while(aSkip = (i < aLen - 1 && IsWhitespaceChar(theCss[i]) ) )
i++;
/////////////////////////////////////////
// Skip comments
if(i < aLen - 1)
if (theCss[i] == '/' && theCss[i+1] == '*') // comment opening
{
aSkip = true;
while(i < aLen - 1)
{
i++;
if(theCss[i-1] == '*' && theCss[i] == '/') // comment closing
{
i++;
break;
}
}
}
}
thePos = i;
if ( IsTokenChar( theCss[i] ) )
return CssState.Token;
else if ( theCss[i] == '\"' )
return CssState.StringD;
else if(theCss[i] == '\'')
return CssState.StringS;
else
return CssState.Punctuation;
}
For for the sake of saving space I have removed several lines, so let's move on to see what they are meant for.
Handling Strings
According to the CSS specification, two types of strings are supported: single-quoted and double-quoted. My algorithm leaves the string intact, even if whitespace characters are found inside it. Of course, for the HTML language it may not matter at all since all whitespace characters will be merged by the browser. But I just think that keeping the string unmodified is more intuitive and professional. In order achieve that, support needs to be added for escaped quote characters. An escaped single quote character inside a single-quoted string does not close the string. Similarly, an escaped double quote character does not close a double quoted string:
if (theCurState == CssState.StringD)
{
if(theCss[i] == '\"')
{
// Make sure the double quote character is not escaped
if(thePos > 0)
if(theCss[i-1] == '\\')
return CssState.StringD;
// Enforce a whitespace afterwards
return CssState.Token;
}
else
return CssState.StringD;
}
else if (theCurState == CssState.StringS)
{
if(theCss[i] == '\'')
{
// Make sure the single quote character is not escaped
if(thePos > 0)
if(theCss[i-1] == '\\')
return CssState.StringS;
// Enforce a whitespace afterwards
return CssState.Token;
}
else
return CssState.StringS;
}
There is a special example attribute in the reference CSS file that illustrates the differences in the way the two algorithms work:
angledouble: "Angle= 00deg00'00\" ";
anglesingle: 'Angle= 00deg00\'00" ';
Although these attributes make no sense to a browser, they represent valid CSS syntax. My version produces the same string while the BlogEngine.NET version trims several spaces from it. I consider this a flaw even though string length gets reduced.
Performance Testing
The suggested algorithm provides both speed and size wins. The code below lists two methods I used to measure output length and processing time of the two algorithms:
[TestMethod]
public void CssMinifyTest_Length()
{
string aCss = Properties.Resources.style;
string aMin1 = CssMinifier.Minify(aCss);
string aMin2 = CssHandler.StripWhitespace(aCss);
Console.WriteLine(
string.Format("KO Software output ({0} bytes):", aMin1.Length) );
Console.WriteLine(aMin1);
Console.WriteLine("---------------------------------");
Console.WriteLine(
string.Format("BlogEngine.NET output ({0} bytes):", aMin2.Length) );
Console.WriteLine(aMin2);
Assert.IsTrue(aMin1.Length <= aMin2.Length);
}
[TestMethod]
public void CssMinifyTest_Speed()
{
int aNumCycles = 50000;
string aCss = Properties.Resources.style;
DateTime aStart = DateTime.Now;
for(int i = 0; i < aNumCycles; i++)
{
string aMin = CssMinifier.Minify(aCss);
}
DateTime aEnd = DateTime.Now;
TimeSpan aTime_AspxGear = aEnd - aStart;
Console.WriteLine(aTime_AspxGear.ToString());
aStart = DateTime.Now;
for(int i = 0; i < aNumCycles; i++)
{
string aMin = CssHandler.StripWhitespace(aCss);
}
aEnd = DateTime.Now;
TimeSpan aTime_BE = aEnd - aStart;
Console.WriteLine(aTime_BE.ToString());
Assert.IsTrue(aTime_AspxGear <= aTime_BE);
}
The results are illustrated in the table below:
|
KO Software CSS Minifier |
BlogEngine.NET CSS Minifier |
Output size, bytes |
6,888 |
7,274 |
50,000 iteration processing time (Debug build) |
33 seconds |
1 minute and 25 seconds |
50,000 iteration processing time (Release build) |
15 seconds |
1 minute and 20 seconds |
Conclusion
The suggested algorithm produces output which is smaller by almost 400 bytes. And it runs more than 5 times faster! Interestingly, the Release build is more than two times faster than the Debug one while the BlogEngine.NET version yields only subtle performance boost. I suspect, this is because the latter uses regular expressions which are already built with most possible optimizations, regardless of our build type.
I am going to contact the BlogEngine.NET team and e-mail my code as a patch to their source code repository. I think, if this code is properly tested in various environments, it can surely improve web site stability and reduce traffic. As an old school programmer, I am still interested in ways of improving the code. So please, download the source code and leave comments with bugs and improvements.