Sign up in 30 seconds.
No credit card.
No risk.
Free forever.
Just try it.
Feed aggregator
Regular Expressions and Pattern Matching with BrowserMob and Selenium
Hello Readers, Welcome back!!
This article intends to explain a few ways of using regular expressions with BrowserMob and Selenium. As we all know already, regular expressions are extremely helpful in scripting dynamic web sites, especially when you run into situations such as picking the first product from a dynamic list of products, clicking on the last link of a dynamic drop-down etc.
To begin with, let’s see a couple of examples for using regular expressions with BrowserMob’s VU (Virtual User) scripts. Not sure what VU is? Check it out here or contact BroserMob Support at anytime.
Example 1:
Single-line Regex – In this example we are trying to find a match against a piece of content spanning across a single line, which is made easy with the BrowserMob method findRegexMatches
In the snippet below, we are trying to parse the first item under the ‘News’ section on yahoo.com using single-line regex.
browserMob.beginStep('Yahoo Home'); var response = c.get('http://www.yahoo.com/',200); // single-line regex // getBody() returns the body of the HTTP response var matches = browserMob.findRegexMatches(response.getBody(), "a class=\"small\" href=\"(.*?)\""); // logging for troubleshooting purposes browserMob.log(matches[0]); var item = matches[0]; browserMob.endStep(); browserMob.beginStep('Follow the First News Item'); // go to the first new item parsed from the previous step var response = c.get(item,200); browserMob.endStep();Example 2:
Multi-line Regex – In this example we are trying to find a match against a piece of content spanning across multiple lines. The Javascript ‘RegExp’ object comes handy in this instance.
In the snippet below, we are trying to parse the first hyperlink from html that spans across multiple lines as indicated below. As you can see, there are few newlines separating the ‘ul’ and ‘li’.
<ul class="menu"> <li><a href="/website-load-testing"> browserMob.beginStep('BM Home'); var response = c.get('http://browsermob.com/performance-testing',200); // multi-line regexp // The regular expression uses '\s' which is any whitespace, including newline, OR \S which is anything NOT a white space var re = new RegExp(/<ul class="menu">[\s|\S]*?<li><a href=\"(.*?)\"/i); var content = response.getBody(); myArray = re.exec(content); var item = myArray[1]; browserMob.endStep(); browserMob.beginStep('Load Testing Home'); // 'item' is the url parsed from the previous step var response = c.get('http://browsermob.com/'+item,200); browserMob.endStep();That was easy !! Now let’s look at a few ways of pattern matching with Selenium for BrowserMob’s RBU scripts.
Selenium supports a few methods that help match text patterns. However, selenium locators don’t accept regular expressions. Only patterns or values accept them.
Globbing:
selenium.click("link=glob:*Gifts"); // Clicks on any link with text suffixed with 'Gifts' selenium.verifyTextPresent("glob:*Gifts*");Regular Expressions:[regexp, regexpi]
selenium.click("link=regexpi:^Over \\$[0-9]+$"); //matches links such as 'Over $75', 'Over $85' etcContains:
selenium.highlight("//div[contains(@class,'cnn_sectbin')]"); //highlights the first div with class attribute that contains 'cnn_sectbin' selenium.highlight("css=div#cat_description:contains(\"to last\")"); //locating a div containing the text 'to last' using css selectorStarts-with:
selenium.click("//img[starts-with(@id,'cat_prod_image')]"); //clicks on the first image that has an id attribute that starts with 'cat_prod_image' selenium.click("//div[starts-with(@id,'tab_dropdown')]/a[last()]"); //clicks on the last link within the div that has a class attribute starting with 'tab_dropdown' selenium.click("//div[starts-with(@id,'tab_dropdown')]/a[position()=2]"); //clicks on the second link within the div that has a class attribute starting with 'tab_dropdown' selenium.highlight("css=div[class^='samples']"); //highlights div with class that starts with 'samples'Ends-with:
selenium.highlight("css=div[class$='fabrics']"); //highlights div with class that ends with 'fabrics' selenium.click("//img[ends-with(@id,'cat_prod_image')]"); //clicks on the first image that has an id attribute that ends with 'cat_prod_image'[Note: ends-with is supported only by Xpath 2.0. FF 3 might throw an error for this.]
Happy Testing !!
Tweet This Post
52 weeks of Application Performance – The dynaTrace Almanac
dynaTrace Firefox Closed Beta Program started
How to explain growing Worker Threads in JBoss
Is Your Business Ready to Scale? How to Combine the Power of Load Testing & Performance Optimization
How Database Queries Slow Down Confluence User Search
Making your Web Sites faster – Tutorial at QCon London
5 Steps to setup ShowSlow as Web Performance Repository for dynaTrace Data
Understanding Twitter’s Javascript in Multiple Browsers: How to Profile, Debug and Trace across Firefox and IE 6,7,8
Sneak Peak on Firefox support with dynaTrace
Proactively Avoid Site Abandonment by Identifying Thread Contention Issues
Taking on the Fail Whale and Tumblbeast/Tumbeast
Animal tussle going on, please hold
As most of you know, Twitter displays a "Fail Whale" when their servers are overloaded and a 503 page is displayed. A 503 page is displayed when the server is down for maintenance or overloaded. A few companies have decided to modify their 503 page to make it more interesting.
Matthew Inman (creator of http://theoatmeal.com/) decided to create a similar page for Tumblr. Its called Tumblbeast and it looks like this:
Update: Tumblr has renamed it Tumbeast and officially adopted it as a 503 page.
We thought we should make one for our customers....
Just for fun. =).
You can link to this image at:
<img src="http://s3.amazonaws.com/loadimpact_us/images/animaltussle2.jpg" border="0"
alt="Animal Tussle" />
By the way, for those who are interested in making a 503 service unavailable page, here's a decent guide: http://webhostinghelpguy.inmotionhosting.com/web-hosting/how-to-make-a-503-service-unavailable-page/
Update: Oatmeal created this image:
Like us, Matthew is taking a jab at the Fail Whale.You can read more about it here:
http://theoatmeal.com/blog/fail_whale
New Hands On Demo Video of dynaTrace AJAX Edition 2.0 available
A lesson in validation
Those who have worked with me know how much I stress the importance of validation: validate your tools, workloads and measurements. Recently, an incident brought home yet again the importance of this tenet and I hope my narrating this story will prove useful to you as well.
For the purposes of keeping things confidential, I will use fictional names. We were tasked with running some performance measurements on a new version of a product called X. The product team had made considerable investment into speeding things up and now wanted to see what the fruits of their labor had produced. The initial measurements seemed good but since performance is always relative, they wanted to see comparisons against another version Y. So similar tests were spun up on Y and sure enough X was faster than Y. The matter would have rested there, if it weren’t for the fact that the news quickly spread and we were soon asked for more details.
Firebug to the rescueAt this point, we took a pause and wondered: Are we absolutely sure X is faster than Y? I decided to do some manual validation. That night, connecting via my local ISP from home, I used firefox to do the same operations that were being performed in the automated performance tests. I launched firebug and started looking at waterfalls in the Net panel.
As you can probably guess, what I saw was surprising. The first request returned a page that caused a ton of object retrievals. The onload time reported by firebug was only a few seconds, yet there was no page complete time!
The page seemed complete and I could interact with it. But the fact that firebug could not determine Page Complete was a little disconcerting. I repeated the exercise using HttpWatch just to be certain and it reported exactly the same thing.
Time to dig down deeper. Taking a look at the individual object requests, one in particular was using the Comet model and it never completed. On waiting a little longer, there were other requests being sent by the browser periodically. Neither of these request types however had any visual impact on the page. Since requests were continuing to be made, firebug obviously thought that the page was not complete.
Page Complete or Not?This begged the question: how did the automated tests run and how did they determine when the page was done? There was a timeout set for each request, but if the request was terminating because of the timeout, we surely would have noticed since the response times reported would have been the timeout value. In fact, the response time being reported was less than half the timeout value.
So we started digging into the waterfalls of some of the automated test results. Lo and behold – a significant component of the response time was the HTTP Push (also known as HTTP Streaming) one. There were also several of the sporadic requests that were being made well after the page was complete. This resulted in arbitrary response times for Y being reported.
It turned out that the automated tool was actually quite sophisticated. It doesn’t just use a standard timeout for the entire request. Instead it monitors the network and if no activity is detected for 3 seconds, it considers the request complete. So it captured some of the streaming and other post-PageComplete requests and returned when the pause between them was more than 3 seconds. That is why we thought we were seeing “valid” response times which looked reasonable and had us fooled!
Of course, this leads to the big discussion of when exactly do we consider a http request as complete? I don’t want to get into that now as my primary purpose of this article is to point out the importance of validation in performance testing. If we had taken the time to validate the results of the initial test runs, this problem would have been detected a long time ago and lots of cycles could have been saved (not to mention the embarrassment of admitting to others that the initial results reported were wrong !)
Finding out common user behaviour
Learn how to use Google Analytics to derive common user behaviour
Introduction
In addition to finding out how many simultaneous users you need for a load test (for more information you can refer to http://loadimpact.com/blog/search?criteria=analytics), you might also find it challenging to choose visitor behaviors that are representative of all your site visitors. There are many permutations of behaviors possible and it is not practical to simulate everyone of them. We can, however, make use of analytics software to find common user behavior. This guide explains how Google Analytics helps you do the job.
Extracting the data
On the right menu of Google Analytics, select Content -> Top Landing Pages.
Choose an appropriate date range. In this example, we have chosen two months worth of data. Wider date ranges produce more accurate user behaviors, but will not be able to tell you the latest common behavior. To avoid ending up with irrelevant user behaviors, be careful not to choose a start date before any major revisions of your website.
You should now be able to see this on the right hand side of the page. Click the second option to have the data displayed in pie chart form.
Let's focus on the two most popular landing pages, "/" and "/index.php". Visitor paths will be represented by nodes. We will dive into the concept in more detail in a later analysis, but let's complete the whole diagram first.
Next, under "Overview", click "Entrance Paths".
font-family:"Calibri","sans-serif";mso-ascii-theme-font:minor-latin;mso-fareast-font-family:
Calibri;mso-fareast-theme-font:minor-latin;mso-hansi-theme-font:minor-latin;
mso-bidi-font-family:"Times New Roman";mso-bidi-theme-font:minor-bidi;
mso-ansi-language:EN-US;mso-fareast-language:EN-US;mso-bidi-language:AR-SA">
Here you will see the various paths of different content. "Then viewed these pages" shows what was viewed and the corresponding percentages after your visitors went to the selected content. "And ended up here" shows what content was viewed after that.
Let's take a look on the paths taken by visitors after arriving at the http://loadimpact.com/ page. We will ignore the first two results as there is an auto loading script in the index page (this can be accounted for by increasing sleep time later on). Hence, other probable content that our visitors will go to are "/products.php" and "/products.php?basic pages". We will add this information to the diagram:
We click "/products.php?light=" and look under "And ended up here:". Again, we find the two most visited pages and plot them in the diagram together with corresponding precentages of users that came from the previous page.
For the branch starting with "/index.php", we get the following:
The above diagram shows the paths commonly taken by visitors starting from loadimpact.com/index.php.
We could go on for a few more iterations, but for this example we will stop here. Next, go to "Top Exit Pages" and verify that the top two exit contents appeared in your last iteration.
In this case they did appear and we can safely say that three iterations are sufficient to give an accurate picture of visitor behaviors.
Analysis
In every branch, we multiply the percentages together to obtain a "comparison number". For example, starting from /index.php to /pageanalyzer.php to / yields 0.064*0.0403*0.0221 = 0.000057.
We do that for every branch and then rank the sequence of content from highest to lowest "comparison number".
Hence, the most common user behavior would be to go to our index page, and from there click "Sign up now" under "Load Test Light", and then return to the index page before leaving the website. This is illustrated in the top diagram below.
The other common behavior would be to go to our index page (Note that http://loadimpact.com/ and http://loadimpact.com/index.php point to the same page with the same codes), and then subsequently click "Sign up now" followed by "Proceed to registration" on the products page.
Once common user behaviors are determined, we can proceed to transform it into a load script via the session recorder (available through purchase of any Load Impact Premium account).
It should be noted that this method is only valid for a general load test on your website. You might also want to test specific functionality, in which case this method is not very useful.
Web Performance Optimization Use Cases: Part 4 Load Time Optimization
Antivirus Add-On for IE to cause 5 times slower page load times
Going to PyCon 2011!
This year I am attending my first PyCon (the annual Python community conference).
I will be in Atlanta: March 10-13.
If anyone is interested in meeting up or collaborating while I'm there, get in touch:
- Twitter: @cgoldberg
- Homepage/Info: goldb.org
Tools for Web Performance Analysis
At Yahoo!, I’m currently focused on analysis of end user performance. This is a little different than what I’ve worked on in the past, which was mainly server-side performance and scalability. The new focus requires a new list of tools so I thought I’d use this post to share the tools I’ve been learning and using in the past couple of months.
HttpWatchThis made the top of my list and I use it almost every day. Although it has features very similar to firebug, it has two features that I find very useful – the ability to save the waterfall data directly to a csv file and a stand-alone HttpWatch Studio tool that easily loads previously saved data and reconstructs the waterfall ( I know you can export Net data from firebug, but only in HAR format). And best of all, HttpWatch works with both IE and Firefox. The downside is that it works only on Windows and it’s not free.
FirebugThis is everyone’s favorite tool and I love it too. Great for debugging as well as performance analysis. It is a little buggy though – I get frustrated when after starting Firebug, I go to a URL expecting it to capture my requests, only to find that it disappears on me. I end up keeping firebug on all the time and this can get annoying.
HttpAnalyzerThis is also a Windows-only commercial tool – it is similar to Wireshark. It’s primary focus is http however, so it is easier to use than Wireshark. Since it sits at the OS level, it captures all traffic, irrespective of which browser or application is making the http request. As such, it’s a great tool for analyzing non-browser based http client applications.
GomezYet another commercial tool, but considering our global presence and the dozens of websites that need to be tested from different locations, we need a robust, commercial tool that can meet our synthetic testing and monitoring requirements. Gomez has pretty good coverage across the world.
I have a love-hate relationship with Gomez. I love the fact that I can do the testing I want at both backbone and last mile, but I hate it’s user interface and limited data visualization. We have to resort to extracting the data using web services and do the analysis and visualization ourselves. I really can’t complain too much – since I didn’t have to build these tools myself !
Command-line toolsLast, but not the least, I rely heavily on standard Unix command-line tools like nslookup, DIG, curl, ifconfig, netstat, etc. And my favorite text processing tools remain sed and awk. Every time I say that, people shake their heads or roll their eyes. But we can agree to dis-agree without getting into language wars I think.