Category Archives: Technology

UPDATE: After doing extensive checking with extended family, this has proven to be legitimate (though very unexpected). Please ignore the post and move along, Gmail is still secure for now! My sincere apologies for raising an alarm.

This morning I opened my laptop and went to gmail.com to check my email, but was a little confused at first. The first email was from Amazon Local Deals, which I was pretty sure I had unsubscribed from a while ago, and furthermore it was from an area I used to live in, but have since moved from. Then I saw that two people that I did not know had circled me on Google+, not completely unusual but still unexpected. Then the kicker… my name was gone from the top right, and instead I was inside of Sarah Jenkins’ account (name changed).

At that point I shot back to the inbox, and sure enough, I was in a completely different person’s account. All the emails were completely foreign, the chat list was full of people I did not know, and the +You name in the top right was definitely +Sarah, not +Joshua. I quickly checked Chrome’s Web Inspector and looked at the cookies. Indeed, everything appeared as if I were her, almost as if it was a Firesheep session, but it most certainly was not.

I certainly got out of her account as quickly as I could, but did take a quick screenshot and saved the network data (and corresponding cookie information) strictly for evidence in hopefully helping the Gmail team should they need debugging evidence. I would never want to violate this other person’s privacy, just as I would not want mine violated.

And that is what scared me: this happened to me, being in someone else’s account. But what if a different person in the meantime has been in mine? Email is the gateway to everything online, and I would never want anyone in my account that shouldn’t be there. An incredibly bizarre and potentially dangerous situation.

Facts:

 

  • I had not logged in, I had just opened Gmail in a new tab.
  • I use multi-login, and am normally logged into two other accounts at the same time. Neither of these accounts were available, just the other individuals.
  • I also use 2-step authentication (thankfully!), not sure if this would affect my account.
  • The cookie looked completely like her identification, nothing to do with mine.

One of the oddest things is that the stranger was 90% random, but based on their apparent location and a few email subjects, I could have in theory at least formerly lived near this person.

 

My best hypothesis right now is that there was a networking-based error that occurred somewhere along the way, where traffic destined for me/her was switched somewhere along the lines. Otherwise the culprit would have to be in Gmail’s systems. Either scenario is scary; this should never happen. Being in someone else’s account is like having the key to their kingdom. I could have read all of her emails, looked at her Youtube subscriptions or posted something as her, reset other possible accounts that send email password reminders or reset links–everything short of actually changing her Gmail password (which thankfully would have required me to actually know her existing password). In general, Google authentication is quite secure, but what happened today made me very nervous about my own account’s safety, and about the infrastructure in general.

If you are on the Gmail team and can follow up on this, please contact me. I will try and examine the .HAR files I exported when I get a chance, and if I have any updates I’ll report back here.

Share and Enjoy

  • Twitter
  • Google Plus
  • LinkedIn
  • Email
  • RSS
  • HackerNews
  • Instapaper
  • StumbleUpon
  • Facebook

They’ve done it again. The team at Automattic have created a great new default theme for this year’s release of WordPress 3.4, Twenty Twelve. It’s an extremely well designed and well-written theme that incorporates great responsive design, an attractive, text-focused approach, and good best practices. I’ve been waiting to use it on my site since it was announced, and now that it is available in the WordPress.org Theme Repository I have been playing with it.

Of course that means that I need to add all of my Schema.org microdata enhancements back in–I can’t go from having all of that embedded, machine readable data to having nothing! So I applied the same techniques as I did for the previously released Twenty Eleven Schema.org child to this year’s theme, and it is ready for public consumption.

What is Schema.org and microdata? In short, it is invisible, enhanced markup that lets search engines and other system agents read about your content, including authorship, dates, tags, content delineation, etc., in a way that they can understand. To quote from my previous post about the T11 child theme:

Adding microdata to your site has several benefits. First and foremost, you contribute to machine readable data everywhere. The Internet is a wonderful place for humans to browse, but we can make it more accessible and more consumable if we let the computer figure as much of it out as it can. Second, search engines can use this data to get a better understanding of each page that it indexes, and hopefully provide more relevant search results. (Notice that I am not saying you are going to get an SEO boost for doing this. You may, you may not, I have no idea. But if everyone included this data on their sites, the results would be better.) There are no downsides really to simply plugging the data in.

So if you are using the new Twenty Twelve theme and would like to add this microdata to your site, then feel free to download and install the T12 Schema.org child theme on your blog. You can find a download, as well as a link to the source on GitHub, on the Twenty Twelve Schema.org Child Theme dedicated page.

Enjoy, and let me know how you are using it!

Share and Enjoy

  • Twitter
  • Google Plus
  • LinkedIn
  • Email
  • RSS
  • HackerNews
  • Instapaper
  • StumbleUpon
  • Facebook

Looks like Google is going to be getting more into the semantic web game, according to this article from the WSJ.

“Over the next few months, Google’s search engine will begin spitting out more than a list of blue Web links. It will also present more facts and direct answers to queries at the top of the search-results page.
“Google isn’t replacing its current keyword-search system, which determines the importance of a website based on the words it contains, how often other sites link to it, and dozens of other measures. Rather, the company is aiming to provide more relevant results by incorporating technology called ‘semantic search,’ which refers to the process of understanding the actual meaning of words.”

My guess is that Google will be using Schema.org markup, since that is what drives its +1 buttons for metadata, as well as potentially doing a bit of scraping/AI on existing content. Now would be a good time for all developers and content creators to evaluate how they are using semantic markup and make sure that they are up to par.

I have implemented Schema.org markup on this site through my Schema.org Twenty Eleven child theme. If you are using Twenty Eleven also, feel free to download it and start using it as well. You can read more about and download it here: Twenty Eleven Schema.org Child Theme. There is much more you can as well, specifically in adding custom and appropriate markup to individual posts/pages, but it provides a good start.

Share and Enjoy

  • Twitter
  • Google Plus
  • LinkedIn
  • Email
  • RSS
  • HackerNews
  • Instapaper
  • StumbleUpon
  • Facebook
Fever Logo

I love reading, and I love keeping up on news. Because of this, I have been an avid fan of RSS technology, and specifically of Google Reader, for a long time. In fact, according to the Reader stats, I’ve gone through over 65,000 articles since I started using the product back in April of 2008. That’s a lot of great knowledge and information! Unfortunately, it has started to become a large source of information overload and a time sink. I felt this most acutely when I was working full-time at internships and keeping busy with my family in the evenings. It’s hard to scan several hundred article titles a day and read the ones that may be interesting, and it simply wasn’t working anymore.

Enter Fever°.

Fever Logo

Fever° is a web application written by the venerable Shaun Inman that acts sort of like an RSS reader, but takes it to the next level by sorting out which articles should actually be important to you, and bubbling only those to the top for consumption. The basic premise is that if a piece of news is important, several different websites will all link to the same original source, and the more sites that do so, the more important the news is. By only showing you what’s hot (and by critically leaving off unread counts) you can get down to what is truly interesting without having to wade through hundreds of articles yourself.

How has my experience been? In short, very good! Let me share a little anecdote that illustrates how pleasantly surprised I have been by Fever°.

After installing the PHP-based application on my server and setting it up with all my existing subscriptions via an OPML export, I categorized my “must reads” out from the “sparks” and it came back showing me seven or eight stories that it deemed important. Some of them indeed were, and I clicked through to read the articles they referenced. Every day for one week there were perhaps five or so interesting articles that Fever° separated out, and I often read them all. However, I was only spending about 15 minutes a day reading through these articles, whereas before cleaning out my Google Reader would occupy an hourish a day cumulatively. I began to feel that perhaps I was missing out on little gems that I would have otherwise caught, because maybe Fever° didn’t see a lot of other blogs linking to the same source. I got a little nervous, and decided I would revisit my week’s worth of unread Google Reader items and see what I was missing.

Unread count: 747 items. So I began trudging through. I’m pretty good at scanning headlines and skipping the fluff, but in the end, I had only selected out 8 articles to skim that I hadn’t seen come through Fever°, and only 2 of them were really something I might have been interested in! Only two articles in a whole week that I might have otherwise read, versus the several a day that Fever° picks out for me. All right! It was then that I realized just how well the product was working, and how much more efficient it had made my content discovery. Awesome!

A screenshot of what my Fever window looks like right now

A screenshot of what my Fever window looks like right now

Quips

I still have a few adjustments to make to get used to Fever°, though in general it is a very well executed and designed program (not that you would expect less from Shaun). A few issues bother me that come to mind:

  • I occasionally get timeout issues when the cron script is running in the background to update feeds. The timeout limit has even been bumped up to an absurd 360 seconds, and the 500 errors still roll in sometimes. I assume this is because one or two feeds are just not responding and causing the blocking PHP script to hang (this would be a great use of a non-blocking component with something like Node.js or similar). A little niggle, but still kind of annoying to know that the refresh doesn’t complete from time to time. It would be great if Fever° would log slow queries for troubleshooting.
  • The algorithm for pulling titles is not perfect, and sometimes leads to poor results. For example, several blogs linked to a report about the iPhone 4S consuming lots of data because of Siri, but the links to that story in each post were one or two word links like “new study” or “a study.” Fever° put the title of that particular story as “new study.” Not exactly helpful, and happens more than I would like.
  • You can view posts using either an excerpt view or a full view. Excerpt view is nice because it keeps things compact, but sometimes the click areas to get the full view are a little confusing. Probably just need to get used to it.

Conclusion

Overall, I am very pleased with Fever°. It was a purchase I had been considering for a while, and with the time it has saved me already, it has paid for itself a couple times over (it costs $30). It only works as well as the feeds you supply it, but I’m happy to report that my information intake has been satisfied, and my time spent significantly reduced. If you haven’t already, check out its website and watch the demo video, you’ll learn a lot from it.

I am beginning a somewhat related information overload study this semester at school, which I will begin writing about tomorrow. Using Fever° has reinforced my belief that information overload can be dealt with, and the results will make your life better!

Share and Enjoy

  • Twitter
  • Google Plus
  • LinkedIn
  • Email
  • RSS
  • HackerNews
  • Instapaper
  • StumbleUpon
  • Facebook

TLDR: While SSDs have a higher up-front cost, they are a large cost-saver in the long run in high-use scenarios such as data centers.

If you were building a large new datacenter, would you rather pay $0.10 per gigabyte for your storage, or $1.10? What if I told you that you should pay $1.10, and that it would save you almost 40% over 10 years? You would probably guess that I had flunked math (which luckily I only came close to doing in AB calculus), but there is a method behind the madness, and one that deserves a closer look.

Here’s the scenario: you are a part of an organization that is gathering historical scans from all around the world and will be archiving them for posterity. The data needs to be stored in a lossless format, and will collectively amount to 1.4 petabytes of data per year. Also, the images will need to be made available on-demand to lucky users across the Internet. How will you go about designing an infrastructure to handle these needs? This scenario is based on a real world implementation, and was given to us in an enterprise applications class with the instructions to create a feasible proposal for the project. While examining the various aspects of the project, we found that using solid state drives would be a huge cost saver in regards to total cost of ownership over the course of 10 years.

When storing large amounts of data, more than raw purchase price needs to be taken into account. Other important costs that must play in include factors such as:

  • failure and replacement rates
  • power costs
  • cooling costs
  • capacity (throughput/output) requirements

Typical enterprise-grade platter-based hard drives can cost as low as $0.10/MB to purchase, versus around $1.00/MB for a solid state drive1. Also, HDDs currently have much higher capacities than SSDs, with large SSDs typically maxing out at around 480 MB instead of 1 or 2 terabytes on HDDs. However, because solid state drives have no moving parts and run much cooler, they have lower failure rates. Furthermore, and more importantly when discussing costs, they draw drastically less power and require much less cooling than an array of hard drives2. Finally, when considering the need to serve read access to clients through the Internet, throughput becomes important. An SSD cluster can deliver on average 20-100x more throughput than can a comparable HDD cluster, even when properly RAIDed. Thus, the need for mirroring and splitting requests across drives drops significantly.

A graph showing the comparative costs over 10 years of HDDs vs SSDs in a datacenter.We plotted out the costs of an infrastructure using both traditional HDDs and new SSDs, considering the amount of drives that would need to be purchased at different times, the power and cooling costs, replacements, etc., and discovered that over the course of ten years, running a datacenter with SSDs would save an estimated $20,376,103.50, or 38%, when compared to the HDD option (HDD TCO: $52,535,121.04; SSD TCO: $32,159,017.54). While the first few years require a greater upfront investment in the actual purchase of drives, the savings in power, cooling, and replacements costs after year 5 begin to pay off substantially by the end of the product (see chart, full calculations available as a Google Spreadsheet here3). This result certainly surprised us, but it makes sense when you consider that adding space with hard drives is a very linear operation—the more drives the more heat and the more power. While solid state drives are pricier to purchase, their TCO is much lower when considered in mass quantities.

Many organizations are beginning to recognize this. Pure Storage is focusing on this angle, eBay recently deployed 100TB of solid state memory in their data center, and big data is really coming into its own. All of this just goes to demonstrate that we may be on the verge of a new and different data center, and that larger upfront costs may just pave the way for less expensive operational costs in the long run.

I certainly make no claims to being an infrastructure or hardware expert, or have experience in data center operations, but at least this was a good learning exercise for me. Remember, don’t discount options right away just because they appear to be more expensive at the outset!

Sources:

  1. Denali, “SSD and HDD Economic Forecast: Analyst Jim Handy Speaks Out,” Jan 26 2010, http://www.denali.com/wordpress/index.php/dmr/2010/01/26/
  2. SSDs consume 15% the power of HDDs and have a 2 million hour MTBF lifespan. “Unified Storage for Dummies,” Oracle.
  3. Calculation assumptions are noted in the spreadsheet.

Share and Enjoy

  • Twitter
  • Google Plus
  • LinkedIn
  • Email
  • RSS
  • HackerNews
  • Instapaper
  • StumbleUpon
  • Facebook