Did you know that popular search engines only track about five percent of the content on the web? DARPA (the federal agency that developed the Internet) has apparently developed a search engine for the other 95 percent. According to InfoWorld…
Dan Kaufman, director of the information innovation office at DARPA, says Memex is all about making the unseen seen. “The Internet is much, much bigger than people think,” DARPA program manager Chris White told “60 Minutes.” “By some estimates Google, Microsoft Bing, and Yahoo only give us access to around 5 percent of the content on the Web.”
Google and Bing produce results based on popularity and ranking, but Memex searches content typically ignored by commercial search engines, such as unstructured data, unlinked content, temporary pages that are removed before commercial search engines can crawl them, and chat forums. Regular search engines ignore this deep Web data because Web advertisers — where browser companies make their money — have no interest in it.
Memex also automates the mechanism of crawling the dark, or anonymous, Web where criminals conduct business. These hidden services pages, accessible only through the TOR anonymizing browser, typically operate under the radar of law enforcement selling illicit drugs and other contraband. Where it was once thought that dark Web activity consisted of 1,000 or so pages, White told Scientific American that there could be between 30,000 and 40,000 dark Web pages.
As a librarian, I find this fascinating. It opens up so many research and ready reference doors. But is also means that we all really are producers as well as consumer of information – whether intentional and now the info we leave can be as valuable as the info we get…
In a demo for “60 Minutes,” White showed how Memex is able to track the movement of traffickers based on data related to online advertisements for sex. “Sometimes it’s a function of IP address, but sometimes it’s a function of a phone number or address in the ad or the geolocation of a device that posted the ad,” White said. “There are sometimes other artifacts that contribute to location.”
White emphasized that Memex does not resort to hacking in order to retrieve information. “If something is password protected, it is not public content and Memex does not search it,” he told Scientific American. “We didn’t want to cloud this work unnecessarily by dragging in the specter of snooping and surveillance” — a touchy subject after Edward Snowden’s NSA revelations.