Companies lead an invisible data war line. And your phe could be an involuntary soldier.
The retailers of Amaz and Walmart at small start-ups want to know what their competitors are charging. Brick and mortar retailers can send people, sometimes called “mystery shoppers,” to their competitors’ stores to take notes prices.
line, there is no need to send people anywhere. But large retailers can sell millis of products, so it’s not possible to have workers who scour each item and manually adjust prices. Instead, companies use software to analyze rival websites and collect prices, a process called “scratching.” From there, companies can adjust their own prices.
Companies like Amaz and Walmart have internal teams dedicated to scraping, says Alexandr Galkin, CEO of the Competera Retail Price Optimizati Company. Others are turning to companies like hers. Competera retrieves web-based pricing data for companies ranging from Nine West shoe retailer to Deelat Industrial Outfitter, and uses machine learning algorithms to help customers decide how much to charge for different products.
Walmart has not respded to a request for comment. Amaz has not answered questis about whether it scrapes from other sites. But the founders of Diapers.com, which Amaz acquired in 2010, have accused Amaz of using such robots to automatically adjust its prices, according to Brad Ste’s book. The store of everything.
Scraping may seem sinister, but that’s part of how the web works. Google and Bing scrape the web pages to index them for their search engines. Academics and journalists use scraping software to collect data. Some Competera customers, including Acer Europe and Panasic, use the company’s “brand intelligence” service to see what retailers charge for their products, to make sure they comply with the agreements of price.
For retailers, scraping can be bidirectial, and that’s where things get interesting. Retailers want to see what their rivals are doing, but they want to prevent rivals from spying them; Retailers also want to protect intellectual property, such as photos and product descriptis, which can be retrieved and reused without permissi from others. According to Akamai Technologies’ vice president of web security, Josh Shaul, there are many who deploy defenses to reverse scraping. A technique: show different prices to real people rather than robots. A site can show astromically high or no price to get rid of data bots.
Such defenses create opportunities for new offenses. A company called Luminati helps customers, including Competera, to hide robots to avoid detecti. A service makes robots seem to come from smartphes.
The Luminati service may look like a botnet, a network of computers running malware that hackers use to launch attacks. Rather than secretly taking ctrol of a device, Luminati encourages owners to accept its software with another applicati. Users who download MP3 Cutter from Beka for Android, for example, have the choice: View the advertisements or allow the app to use “some resources from your device (WiFi and very limited cellular data).” If you agree to let the applicati use For your resources, Luminati will use your phe for a few secds a day when it is inactive to route requests from its customers’ robots, and will pay fees to the manufacturer of the phe. 39 applicati, but Beka has not respded to a request for comment.
The current battle of bot and mouse raises a questi: How to detect a bot? It is difficult. Sometimes the robots say to the sites that they visit that they are robots. When a software accesses a web server, it sends a small piece of information with its request for the page. Classic browsers advertise as Google Chrome, Microsoft Edge or another browser. Robots can use this process to tell the server that they are robots. But they can also lie. A technique for detecting bots is the frequency with which a visitor hits a site. If a visitor makes hundreds of requests per minute, there is a good chance that it will be a bot. Another comm practice is to look at the Internet Protocol address of a visitor. If this comes from a cloud computing service, for example, it is a clue that it could be a bot and not a bad thing to do. a regular user.
Shaul says that techniques like disguising bot traffic have made “almost useless” to rely an internet address. Captchas can help, but they create a nuisance for legitimate users. So Akamai is trying something different. Instead of just looking for the comm behaviors of robots, he looks for the comm behaviors of humans and lets those users through.
“There are really a lot of different scenarios where scratching is used the Internet for good, bad, or somewhere in the middle.”
Josh Shaul, Akamai Technologies
When you press a butt your phe, you move the phe slightly. This movement can be detected by the accelerometer and the gyroscope of the phe and sent to the Akamai servers. The presence of minute moti data is a clue that the user is human, and its absence is a clue that the user can be a bot.
The CEO of Luminati, Ofer Vilenski, says that the company does not offer any soluti yet, as it is a relatively rare practice. But Shaul thinks it’s ly a matter of time before bot makers make themselves heard. Then it will be time for another round of innovatis. So goes the arms race of Internet Bot.
Good Bots and Bad Bots
A big challenge for Akamai and others who are trying to manage the traffic related to robots is the need to allow some, but not all, to scrape a site. If the websites completely blocked the bots, they would not show up in the search results. Retailers also typically want their prices and items to appear shopping comparis sites such as Google Shopping, Price Grabber and Shopify.
“There are really a lot of different scenarios where scratching is used the Internet for good, bad or somewhere in the middle,” says Shaul. “We have a t of customers at Akamai who came to help us manage the global problem of robots, rather than humans, visiting their site.”
Some companies scrape their own sites. Andrew Fogg is the co-founder of a company called Import.io, which offers web-based tools for scraping data. Fogg says that e of Import.io ‘s customers is a large retailer that owns two inventory systems, e for its warehouse operatis and e for its e – commerce site. But both systems are often out of sync. So, the company is scratching its own website to look for discrepancies. The company could integrate its databases more closely, but scraping the data is more profitable, at least in the short term.
Other scrapers live in a gray area. Shaul points to the airline industry as an example. Travel price comparis sites can send business to airlines, and airlines want their flights to appear in the search results of those sites. But many airlines rely outside companies like Amadeus IT and Saber to manage their reservati systems. When you search for flight information through these airlines, the airline sometimes has to pay a fee to the reservati system. These fees can accrue if a large number of robots cstantly check the seat and prices of an airline.
Shaul says that Akamai is helping to solve this problem for some airline customers by showing them the pricing information cached by the bots, so that airlines do not questi the outside companies whenever and ## 148 ## A bot checks prices and availability. Robots will not get the most up-to-date information, but they will get relatively fresh data without costing the airlines too much.
However, other traffics are clearly problematic, such as distributed denial of service attacks, or DDoS, which aim to overwhelm a site by flooding it with traffic. Amaz, for example, does not directly block bots, including price scrapers, a spokeswoman said. But the company “prioritizes humans over robots as needed to ensure we provide the shopping experience that our customers expect from Amaz.”
Fogg says that Import.io does not hang much. The company tries to be a “good citizen” by too often preventing its software from touching servers or using a lot of resources.
Vilenski says that Luminati’s customers have good reas to pretend they are not bots. Some publishers, for example, want to ensure that advertisers show viewers of a site the same ads that they post to publishers.
Still, the company’s business model raised eyebrows in 2015 when a similar service from its sister company, Hola VPN, was used to launch a DDoS attack the 8chan website. Earlier this mth, the Hola VPN Chrome extensi was accused of being used to steal passwords from cryptocurrency MyEtherWallet users. In a blog post, Hola VPN said its Google Chrome Store account was compromised, allowing hackers to add malware to its extensi. Ms. Vilenski says the company is subjecting its customers to scrutiny, including a video call and measures to verify the potential customer’s identity. He declined the alleged malicious uses of the Luminati service. Ctroversial or not, Mr. Vilenski claims that the company has tripled over the past year.