Woshiadai Dev Notebook

March 31, 2008

Java Garbage Collection

Filed under: Uncategorized


A short article about different types of garbage collectors in Java

Never know that Java could have memory leak problem before since I believed that GC will do all the trick to reclaim unreachable objects. But actually it does due to "unintentional object retention". This is a good article explaining about this problem and providing suggestions to deal with it using things like WeakHashMap (finally I understand what this class is useful for).

Firefox 3 Beta

Filed under: Uncategorized

Image and video hosting by TinyPic

I run three browsers on my computer with Opear the most frequently used one. Firefox sometimes consumes too much memory, either because of the memory leak problem or its cache implementation.

Firefox betas are out for quite sometime and I tried its latest 3.0pre from nightly build trunk. It is much faster and polished and I really love it. The only problem is that most plugins are not officially compatible with Firefox 3 betas right now.

There are several FF plugins that I cannot live without:

Greasemoney:

this animal allows you to write customized scripts to control the actual rendering of any page. One most useful script is Linkify ting, it automatically turns text URLs into clickable links by adding <a> tags. But the original script has bugs that it cannot handle many types of URLs. So, what you can do is to open that script and replace "var regex = blah blah" with the following:

"var regex = /\b([\d\w\.\/\+\-\?\:]*)((ht|f)tp(s|)\:\/\/|[\d\d\d|\d\d]\.[\d\d\d|\d\d]\.|www\.|\.tv|\.ac|\.com|\.edu|\.gov|\.int|\.mil|\.net|\.org|\.biz|\.info|\.name|\.pro|\.museum|\.co)([\d\w\.\/\%\+\-\=\&\?\:\\\"\’\,\|\~\;]*)\b/gi;".

There are also many scripts on UserScripts.org including many hacks on iGoogle and Gmail. There are also tutorials like Dive into Greasemonkey and books like Greasemonkey Hacks if you want to write your own scripts. It is really a lot of fun.

 

Firebug:

this is the ultimate toolbox for Web developers, it includes many powerful features to detect JavaScript errors, and design your XHTML and CSS files. There is another addon to Firebug called YSlow! from Yahoo extreme performance group that evaluates a Web site based on several performancing-improving guidelines.

 

Del.icio.us:

this is the Web-based bookmark solution. Although they have a new version that integrates with Firefox bookmark manager, I still like the classic version that adds to buttons on the browser toolbar better.

 

So, how to make these plugins working with FF3.0 betas? It is quite straightforward actually.

1. Manually download XPI file: Instead of click install button on these plugin website (actually, those buttons are greyed out and you cannot click at all), you go to the bottom of a plugin page where there is a link for "advanced details", expand that section and click on "complete version history", then download the xpi file manually using "save as…".

2. Open XPI file with WinRAR and edit install.rdf file: in target application element, there is an element called <em:maxVersion>, just change the value to 3.0pre. Save the install.rdf and put it back to the XPI file.

3. In FF3, use "open file…" to open the modified XPI file. Bingo! You got those old buddies back ;-)

For del.icio.us plugin, there is a bit more extra work due to the signature files. FF3 will have some error for "signing could not be verified" error. Simply open the XPI file with WinRAR and delete META-INF folder and save the XPI file. Here we go, del.icio.us buttons back too!

March 29, 2008

Regex for javascript to detect a URL

Filed under: Uncategorized

var regex = /\b([\d\w\.\/\+\-\?\:]*)((ht|f)tp(s|)\:\/\/|[\d\d\d|\d\d]\.[\d\d\d|\d\d]
\.|www\.|\.tv|\.ac|\.com|\.edu|\.gov|\.int|\.mil|\.net|\.org|\.biz|\.info|\.name|\.pro
|\.museum|\.co)([\d\w\.\/\%\+\-\=\&amp;\?\:\\\&quot;\'\,\|\~\;]*)\b/gi;

 

Online tester for javascript regex:

http://www.regular-expressions.info/javascriptexample.html

Short tutorial for regex in javascript:

http://www.regular-expressions.info/javascript.html

March 25, 2008

Updated High Performance Website Tips

Filed under: Uncategorized

Latest presentation from Yahoo! Exceptional Performance Group

March 24, 2008

Javascript Notes

Filed under: Uncategorized

Found several nice short tutorials for JS:

by Sergio Pereira

  1. Quick guide to somewhat advanced Javascript
  2. Using prototype.js v1.5.0

by fallenlord blog

  1. Class definitions in JS
  2. Inheritance in JS
  3. Event model in JS

March 21, 2008

Notes related to HTTP

Filed under: Uncategorized

From Best Practices for Speeding Up Your Web Site

1. Redirection

HTTP/1.1 301 Moved Permanently
Location: http://example.com/newuri
Content-Type: text/html

2. About ETag

Entity tags (ETags) are a mechanism that web servers and browsers use to determine whether the component in the browser’s cache matches the one on the origin server. (An "entity" is another word for what I’ve been calling a "component": images, scripts, stylesheets, etc.) ETags were added to provide a mechanism for validating entities that is more flexible than the last-modified date. An ETag is a string that uniquely identifies a specific version of a component. The only format constraints are that the string be quoted. The origin server specifies the component’s ETag using the ETag response header.

      HTTP/1.1 200 OK
      Last-Modified: Tue, 12 Dec 2006 03:03:59 GMT
      ETag: "10c24bc-4ab-457e1c1f"
      Content-Length: 12195

Later, if the browser has to validate a component, it uses the If-None-Match header to pass the ETag back to the origin server. If the ETags match, a 304 status code is returned reducing the response by 12195 bytes for this example.

      GET /i/yahoo.gif HTTP/1.1
      Host: us.yimg.com
      If-Modified-Since: Tue, 12 Dec 2006 03:03:59 GMT
      If-None-Match: "10c24bc-4ab-457e1c1f"
      HTTP/1.1 304 Not Modified

The problem with ETags is that they typically are constructed using attributes that make them unique to a specific server hosting a site. ETags won’t match when a browser gets the original component from one server and later tries to validate that component on a different server, a situation that is all too common on Web sites that use a cluster of servers to handle requests. By default, both Apache and IIS embed data in the ETag that dramatically reduces the odds of the validity test succeeding on web sites with multiple servers.

The ETag format for Apache 1.3 and 2.x is inode-size-timestamp. Although a given file may reside in the same directory across multiple servers, and have the same file size, permissions, timestamp, etc., its inode is different from one server to the next.

IIS 5.0 and 6.0 have a similar issue with ETags. The format for ETags on IIS is Filetimestamp:ChangeNumber. A ChangeNumber is a counter used to track configuration changes to IIS. It’s unlikely that the ChangeNumber is the same across all IIS servers behind a web site.

The end result is ETags generated by Apache and IIS for the exact same component won’t match from one server to another. If the ETags don’t match, the user doesn’t receive the small, fast 304 response that ETags were designed for; instead, they’ll get a normal 200 response along with all the data for the component. If you host your web site on just one server, this isn’t a problem. But if you have multiple servers hosting your web site, and you’re using Apache or IIS with the default ETag configuration, your users are getting slower pages, your servers have a higher load, you’re consuming greater bandwidth, and proxies aren’t caching your content efficiently. Even if your components have a far future Expires header, a conditional GET request is still made whenever the user hits Reload or Refresh.

If you’re not taking advantage of the flexible validation model that ETags provide, it’s better to just remove the ETag altogether. The Last-Modified header validates based on the component’s timestamp. And removing the ETag reduces the size of the HTTP headers in both the response and subsequent requests. This Microsoft Support article describes how to remove ETags. In Apache, this is done by simply adding the following line to your Apache configuration file:
      FileETag none

3. HTTP Status Code

Those codes can be found here

What happens when you open a URL in your Web browser

Filed under: Uncategorized

Here is a nice article touching this topic.

Since it is not that long, I simply copy&past here:

Testing Page Load Speed
Posted at 2:27 PM

One of the most problematic tasks when working on a Web browser is getting an accurate measurement of how long you’re taking to load Web pages. In order to understand why this is tricky, we’ll need to understand what exactly browsers do when you ask them to load a URL.

So what happens when you go to a URL like cnn.com? Well, the first step is to start fetching the data from the network. This is typically done on a thread other than the main UI thread.

As the data for the page comes in, it is fed to an HTML tokenizer. It’s the tokenizer’s job to take the data stream and figure out what the individual tokens are, e.g., a start tag, an attribute name, an attribute value, an end tag, etc. The tokenizer then feeds the individual tokens to an HTML parser.

The parser’s job is to build up the DOM tree for a document. Some DOM elements also represent subresources like stylesheets, scripts, and images, and those loads need to be kicked off when those DOM nodes are encountered.

In addition to building up a DOM tree, modern CSS2-compliant browsers also build up separate rendering trees that represent what is actually shown on your screen when painting. It’s important to note two things about the rendering tree vs. the DOM tree.

(1) If stylesheets are still loading, it is wasteful to construct the rendering tree, since you don’t want to paint anything at all until all stylesheets have been loaded and parsed. Otherwise you’ll run into a problem called FOUC (the flash of unstyled content problem), where you show content before it’s ready.

(2) Image loads should be kicked off as soon as possible, and that means they need to happen from the DOM tree rather then the rendering tree. You don’t want to have to wait for a CSS file to load just to kick off the loads of  images.

There are two options for how to deal with delayed construction of the render tree because of stylesheet loads. You can either block the parser until the stylesheets have loaded, which has the disadvantage of keeping you from parallelizing resource loads, or you can allow parsing to continue but simply prevent the construction of the render tree. Safari does the latter.

External scripts must block the parser by default (because they can document.write). An exception is when defer is specified for scripts, in which case the browser knows it can delay the execution of the script and keep parsing.

What are some of the relevant milestones in the life of a loading page as far as figuring out when you can actually reliably display content?

(1) All stylesheets have loaded.
(2) All data for the HTML page has been received.
(3) All data for the HTML page has been parsed.
(4) All subresources have loaded (the onload handler time).

Benchmarks of page load speed tend to have one critical flaw, which is that all they typically test is (4). Take, for example, the aforementioned cnn.com. Frequently cnn.com is capable of displaying virtually all of its content at about the 350ms mark, but because it can’t finish parsing until an external script that wants to load an advertisement has completed, the onload handler typically doesn’t fire until the 2-3 second mark!

A browser could clearly optimize for only overall page load speed and show nothing until 2-3 seconds have gone by, thus enabling a single layout and paint. That browser will likely load the overall page faster, but feel literally 10 times slower than the browser that showed most of the page at the 300 ms mark, but then did a little more work as the remaining content came in.

Furthermore benchmarks have to be very careful if they measure only for onload, because there’s no rule that browsers have to have done any layout or painting by the time onload fires. Sure, they have to have parsed the whole page in order to find all the subresources, and they have to have loaded all of those subresources, but they may have yet to lay out the objects in the rendering tree.

It’s also wise to wait for the onload handler to execute before laying out anyway, because the onload handler could redirect you to another page, in which case you don’t really need to lay out or paint the original page at all, or it could alter the DOM of the page (and if you’d done a layout before the onload, you’d then see the changes that the onload handler made happen in the page, such as flashy DHTML menu initialization).

Benchmarks that test only for onload are thus fundamentally flawed in two ways, since they don’t measure how quickly a page is initially displayed and they rely on an event (onload) that can fire before layout and painting have occurred, thus causing those operations to be omitted from the benchmark.

i-bench 4 suffers from this problem. i-bench 5 actually corrected the problem by setting minimal timeouts to scroll the page to the offsetTop of a counter element on the page. In order to compute offsetTop browsers must necessarily do a layout, and by setting minimal timers, all browsers paint as well. This means i-bench 5 is doing an excellent job of providing an accurate assessment of overall page load time.

Because tests like i-bench only measure overall page load time, there is a tension between performing well on these sorts of tests and real-world perception, which typically involves showing a page as soon as possible.

A naive approach might be to simply remove all delays and show the page as soon as you get the first chunk of data. However, there are drawbacks to showing a page immediately. Sure, you could try to switch to a new page immediately, but if you don’t have anything meaningful to show, you’ll end up with a "flashy" feeling, as the old page disappears and is replaced by a blank white canvas, and only later does the real page content come in. Ideally transitions between pages should be smooth, with one page not being replaced by another until you can know reliably that the new page will be reasonably far along in its life cycle.

In Safari 1.2 and in Mozilla-based browsers, the heuristic for this is quite simple. Both browsers use a time delay, and are unwilling to switch to the new page until that time threshold has been exceeded. This setting is configurable in both browsers (in the former using WebKit preferences and in the latter using about:config).

When I implemented this algorithm (called "paint suppression" in Mozilla parlance) in Mozilla I originally used a delay of 1 second, but this led to the perception that Mozilla was slow, since you frequently didnt see a page until it was completely finished. Imagine for example that a page is completely done except for images at the 50ms mark, but that because you’re a modem user or DSL user, the images aren’t finished until the 1 second mark. Despite the fact that all the readable content could have been shown at the 50ms mark, this delay of 1 second in Mozilla caused you to wait 950 more ms before showing anything at all.

One of the first things I did when working on Chimera (now Camino) was lower this delay in Gecko to 250ms. When I worked on Firefox I made the same change. Although this negatively impacts page load time, it makes the browser feel substantially faster, since the user clicks a link and sees the browser react within 250ms (which to most users is within a threshold of immediacy, i.e., it makes them feel like the browser reacted more or less instantly to their command).

Firefox and Camino still use this heuristic in their latest releases. Safari actually uses a delay of one second like older Mozilla builds used to, and so although it is typically faster than Mozilla-based browsers on overall page load, it will typically feel much slower than Firefox or Camino on network connections like cable modem/modem/DSL.

However, there is also a problem with the straight-up time heuristic. Suppose that you hit the 250ms mark but all the stylesheets haven’t loaded or you haven’t even received all the data for a page. Right now Firefox and Camino don’t care and will happily show you what they have so far anyway. This leads to the "white flash" problem, where the browser gets flashy as it shows you a blank white canvas (because it doesn’t yet know what the real background color for the page is going to be, it just fills in with white).

So what I wanted to achieve in Safari was to replicate the rapid response feel of Firefox/Camino, but to temper that rapid response when it would lead to gratuitous flashing. Here’s what I did.

(1) Create two constants, cMinimumLayoutThreshold and cTimedLayoutDelay. At the moment the settings for these constants are 250ms and 1000ms respectively.

(2) Don’t allow layouts/paints at all if the stylesheets haven’t loaded and if you’re not over the minimum layout threshold (250ms).

(3) When all data is received for the main document, immediately try to parse as much as possible. When you have consumed all the data, you will either have finished parsing or you’ll be stuck in a blocked mode waiting on an external script.

If you’ve finished parsing or if you at least have the body element ready and if all the stylesheets have loaded, immediately lay out and schedule a paint for as soon as possible, but only if you’re over the minimum threshold (250ms).

(4) If stylesheets load after all data has been received, then they should schedule a layout for as soon as possible (if you’re below the minimum layout threshold, then schedule the timer to fire at the threshold).

(5) If you haven’t received all the data for the document, then whenever a layout is scheduled, you set it to the nearest multiple of the timed layout delay time (so 1000ms, 2000ms, etc.).

(6) When the onload fires, perform a layout immediately after the onload executes.

This algorithm completely transforms the feel of Safari over DSL and modem connections. Page content usually comes screaming in at the 250ms mark, and if the page isn’t quite ready at the 250ms, it’s usually ready shortly after (at the 300-500ms mark). In the rare cases where you have nothing to display, you wait until the 1 second mark still. This algorithm makes "white flashing" quite rare (you’ll typically only see it on a very slow site that is taking a long time to give you data), and it makes Safari feel orders of magnitude faster on slower network connections.

Because Safari waits for a minimum threshold (and waits to schedule until the threshold is exceeded, benchmarks won’t be adversely affected as long as you typically beat the minimum threshold. Otherwise the overall page load speed will degrade slightly in real-world usage, but I believe that to be well-worth the decrease in the time required to show displayable content.

Website Performance

Filed under: Uncategorized

This is a very nice book by the creator of Yslow, an addon to Firebug. It lists 14 rules for high performance websites. Very useful.

DavidHerron.com also has many resources for high performance websites.

March 20, 2008

XHTML and CSS notes

Filed under: Uncategorized

Stylin with CSS

 

XHTML Structure

 

DOCTYPE declarations:

Strict:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"

  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

Transitional:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"

  "http://www.w3.org/TR/html4/loose.dtd">

Frameset:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN"

  "http://www.w3.org/TR/html4/frameset.dtd">

 

XML namespace declaration:

<html xmlns=http://www.w3.org/1999/xhtml lang="en" xml:lang="en">

 

Content type declaration:

<meta http-equiv="Content-type" content="text/html; charset=iso-8859-1" />

 

Symbols such as <, &:

http://htmlhelp.com/reference/html40/entities/

 

A simple template:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"

  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<html xmlns=http://www.w3.org/1999/xhtml lang="en" xml:lang="en">

<head>

    <meta http-equiv="Content-type" content="text/html; charset=iso-8859-1" />

    <title>Title for your Web page</title>

</head>

<body>

</body>

</html>

 

Four ways of CSS declaration:

  1. embed <style></style> section in the head element
  2. link to external CSS file: <link href="my_style_sheet_print.css" media="screen" rel="stylesheet" type="text/css">, for print, just change media="print"
  3. in-line style, just specify at each tag, e.g. <p style="font-size: 25pt; color: red;">
  4. use @import in <style></style> section in the head element. The only downside is that IE6 might have so called FOUC (Flash of Un-styled Content) problem, meaning the content will be momentarily displayed without CSS formatting. See more at http://bluerobot.com/web/css/fouc.asp

 

CSS syntax-related stuff:

 

Contextual selector:

this limits style to parent element, e.g. p em{color:green;} makes only those <em> within <p> elements green color.

Class selector:

e.g. p.green{color:green;} <p class="green">green text</p>, note that if multiple classes exist, the last one declared in CSS definition file wins.

Id selector:

p#green{color:green;} <p id="green">green text</p>

Difference between id and class selector is that one id value is unique to one element and id is usually used in javascript as well. So, for those styles unique to an element, use id, for styles that can be shared among different elements, use class selector.

Attribute selector:

select element based on attribute existence or values, e.g. this example add a pdf icon after links to pdf files, use [href|="foo"] to specify link names that start with "foo"

a[href$=".pdf"] {

background:transparent url(images/icon_pdf.gif) no-repeat scroll right center;

padding-right:18px;

}

Unit measurement:

try to use relative units: em (from the width of a character), ex (from the height of character x), or percentage.

Colors:

many different ways to specify a color: #RRGGBB, (R%, G%, B%), or use color name. There are only 16 color names in the spec: aqua, black, blue, fuchsia, gray, green, lime, maroon, navy, olive, purple, red, silver, teal, white, yellow.

Fonts:

serif, sans-serif, monospace, fantasy, cursive, e.g. p{font-family:sans-serif;}

Round corners:

March 17, 2008

Windows 2003 Standard Server

Filed under: Windows

Microsoft started offering many software to students for FREE at DreamSpark. I got a copy of Windows 2003 Standard Server and installed it over the weekend. It is much faster and memory efficient than my old Windows 2000 server. But it is quite annoying to configure it as a daily workstation too. Here are some tips that I want to take a note for.

1) Add a new user: Start -> Run…, type lusrmgr.msc and this opens the user and group management. Select Users, right click and select New User, then create a new user.

If you want to add admin right to the new user, right click on the newly create user. Select properties, open Member Of tab, click Add… , in the Enter the object names to select textbox, type Administrators and hit Check Names button (it will fill the full name of admin role), then click OK.

2) IE security tuning: IE’s default security profile is set to High, and this makes is very unusable because it keeps bugging you to add virtually every website to trusted zone. Here is how to set to a lower security level: open IE, go to Tools…, go to Internet Options…, select Security tab, in “Security Level for this zone” change High to Medium and confirm the change.

3) Install Kaspersky 6 desktop version: Kaspersky is for desktop computers only and there exists server version and it is quite expensive. I have successfully installed KAV 6 on Windows 2000 server before, but installing it on Windows 2003 server SP2 is much trickier.

  • Modify KAV msi file using ORCA MSI editor: open KAV msi file using ORCA, then do a ctrl-F search for “MsiNTProductType=1″, replace each occurrence with “MsiNTProductType>=1″ and save the msi file. Then you can install it without any problem.
  • After installation and adding the license file, when you restart, Windows 2003 server will show a blue screen. Press the famous F8 and select “safe mode with command line”. I tried to select GUI safe mode, but it just didn’t respond to keyboard.
  • On the command line. Type “edit” command and this will give you a basic text editor, create a file called kav.reg with the following content:

    Windows Registry Editor Version 5.00
    [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\kl1]
    "Start"=dword:00000001
  • Save it and exit the editor. Then, type “regedit /s kav.reg” to import this change to the registry. Reboot the machine using “shutdown /r” command.
  • Finally, no more blue screen!

4) Hardware drivers: driver support is not so good on 2003 server. I could not find a driver for my RAID IDE-SATA converter card. I should have used software to backup drivers before upgrading. Here is a good one called Driver Genius.

5) Adding “show desktop” icon to the task bar: for the newly created user, there is no show desktop icon. You can either copy a file called Show Desktop.scf from C:\Documents and Settings\Administrator\Application Data\Microsoft\Internet Explorer\Quick Launch\ to C:\Documents and Settings\\Application Data\Microsoft\Internet Explorer\Quick Launch\.

Or you can directly create a new Desktop.scf in C:\Documents and Settings\\Application Data\Microsoft\Internet Explorer\Quick Launch\, here is the file content:


[Shell]
Command=2
IconFile=explorer.exe,3
[Taskbar]
Command=ToggleDesktop






















Get free blog up and running in minutes with Blogsome
Theme designed by Ben de Groot