Building a scalable API means having a deep understanding of your users' usage patterns. Over the past few months, we have made significant changes to our application architecture to meet our communities needs and as a final piece to the puzzle, we are releasing a new foundation for all our future APIs. We recognize updating code to use a new API is never fun, so we tried to make it easier by building an extensive client library, improving our API documentation and including some new datasets to correlate infrastructure.
So, what's changed?
Almost everything related to the API has changed in some form. Below is a high-level list of changes that were done for version two:
- API endpoint has been moved to https://api.passivetotal.org/
- API is load-balanced to support a lot more traffic
- Designed with REST best practices in mind
- New documentation with data formats, example responses and code examples
- Python client module with command line tools and developer libraries
- Addition of OSINT, account, tracking codes and web components API calls
Many of the existing API calls have been adjusted to provide only the relevant data to the call. Our existing implementation added a lot of extra data on return payloads which not only slowed down the call responses, but also we seldom used by our clients. The new design of the calls will run faster, and ensure we can continue scaling for our users.
Newly added datasets
As mentioned in the above changes, we have added a couple new datasets in this API update that are extremely useful in making infrastructure connections.
Several months ago, we released a blog detailing the inclusion of open source intelligence into the web platform that would show when querying for data or making pivots. This data is now available through the enrichment endpoint and will return content including the source, URL, tags and indicators mentioned in the OSINT reporting.
In continuing to integrate RiskIQ data into the PassiveTotal platform, we worked with our data scientist and engineers to extract tracking codes from the millions of web pages we crawl on a daily basis. Users can search on a growing number of different tracking codes including, but not limited to Google Analytics, New Relic, Yandex, Mixpanel and more. Additionally, we have exposed an endpoint that accepts domains or IP addresses and will return a history of tracking codes and how they relate to the original query. We have found this dataset to be incredibly value in surfacing phishing and exploit kit activity.
Also utilizing the RiskIQ crawling is our web components endpoint. Querying either a domain or IP address will reveal a history of different changes made to the host queried. This could include server changes like a new version of Apache, but will also include page-level observations like the addition of a new jQuery library or 3rd-party analytics tracking pixel. At the time of release, we haven't created a search endpoint, but this will soon be available.
One of the biggest additions that comes along with the new API is a fully-feature Python module available through PyPi. When desinging the new API, we realized that it was great to provide sample code, but our users were still largely left to their own devices to create an application of their own. With the Python module, we wanted to provide a robust command line tool for users wanting to query our data services, but also wanted to provide developers with a set of abstraction libraries, so they could develop their own suite of tools.
Over the next several weeks, we will blog and populate the examples folder of the Python module. Our hope is that by removing the complexity around our API, that users will begin to experiment and create more tools to help them surface new connections. This abstraction also means we can avoid future updates that require significant code changes or adjustments and instead will be a simple update command to the existing library.