UPDATE 12/21/2011: LoadStorm has a user agent = “LoadStorm” to allow testers to identify traffic from our load testing tool. In LoadStorm 2.0, we will provide a way for you to make the user agent whatever you wish because several customers have asked to control the setting.
Typically, “nice” robots that crawl the web should obey the robots.txt file in a server’s root domain. This typically prohibits robots from accessing files that create unnecessary load on the server or which the site operator does not wish to be indexed by Google or Yahoo search engines. Furthermore, web content providers may discriminate specific content based on the User-Agent string if one is even given. It is the specific intention of LoadStorm to circumvent the robots safeguard as wells as any other prevention/discrimination of user agent navigation.
User Agent Spoofing is the practice of sending false information in the user agent string. Some web servers will refuse to serve any content to a user agent which does not identify itself. Additionally, content may be served in a variety of flavors and types depending on the user agent requesting the information. Currently, LoadStorm will identify itself as Microsoft Internet Explorer. This browser being the most widely known and used web content browser. This is not accessible or modifiable. This behavior may be allowed to be user customizable in the future. This will allow not only the browser type to be specified, but also the browser version for older standards. However, current web 2.0 standards encourage browser-independent design and most “smart” web users regularly update their web browsers.
A typical advanced user agent will provide some amount of caching. This User Agent spoofing does not provide literal caching of prior requests, but does remember them. Therefore, there are differences between a web browser and LoadStorm request signatures. A typical browser will maintain time stamps of cached content and typically not make a code 304 (content unchanged) request to the content server. LoadStorm will never make an intentional 304 request. A typical browser will not request an HTML page if the time stamp has not expired. LoadStorm will always request the HTML page more than once if requested by a test sequence, but not necessarily content referenced by the page requested more than once. In fact, LoadStorm will download anything more than once if directly requested to by its test sequence, but will not intentionally download content referenced by the request more than once i.e. images, css, or js files.