The Internet was not designed with security in mind. A number of recent protocols such as Encrypted DNS, HTTPS, etc. target encrypting critical parts of the web architecture, which were previously sent in the clear. IP addresses still remain visible to on-path observers and can be utilized for censorship, surveillance and sabotaging user’s privacy on the web. We perform a measurement study on datasets representative of the state of the Internet fetched via HTTP Archive or those collected with configurations like Adblock enabled vs. disabled over extended periods of time by crawling Alexa’s top websites to gauge the amount of information leaked by IP addresses. We build a page load fingerprint for each of the websites crawled and filter the websites that have uniquely identifying IP addresses mapped to them. We build a neural network to study how accurately the classifier works in fingerprinting websites based on IP addresses and their respective Autonomous System Numbers (ASNs). Approximately 80% of the IP addresses have an anonymity set comprising of a unique website and can successfully identify it. The classifier performs with an accuracy of about 60% on the remaining data. We observe that the classifier confuses websites belonging to common hosting infrastructures. Manual clustering efforts on the data based on these trends can increase the classification accuracy. We find areas of improvement for the current measurement study and provide suggestions to Content Delivery Networks (CDNs) and other agents fundamental to the Internet infrastructure to increase user privacy.
【 预 览 】
附件列表
Files
Size
Format
View
Privacy implications of information leakage from IP addresses - a web fingerprinting approach