Is it possible to take a complex set of data points and distill them down into a set of simple color-coded tags? For several months, Steve and I pondered what made a good data tag and how they could be used within our system. Viewing the data we have today, and our platform to do analysis, I think the answer is without a doubt, yes, we can represent a lot of complexity in a single data tag. In order to understand how we got to where we are today, its best to review some of the lessons we have learned along the way.
Social analysis missteps
The very first version of PassiveTotal was built largely around a social design. Queries were earned by accurately classifying domains or IP addresses with the majority opinion of other analysts and tags were considered a global value. When you tagged an item, anyone could see it. For some users this was great, others, not so much. The lesson we learned from the social model was that people wanted privacy with the ability to share data, not the other way around.
In our current version of PassiveTotal, instead of making all details social, we allow each user the ability to classify or data tag items without it being exposed to the rest of the community, while still providing a community classification aspect around the sinkhole, dynamic DNS and ever compromised fields. These classifications are globally gated, meaning a certain threshold of users need to agree on a value before it's pushed live to the rest of the community.
Not all tags are equal
It has also become abundantly clear that one data tag to rule them all was not the best model. In our view, tags were meant to be more than a simple way to group data points, they should also drive analysis. With that in mind, we came up with several different tags categories, each with their own color and icon.
With this new data tag design, we've found it easier to interpret the results that are displayed within the web interface. At a glance, it's easy to identify the types of details associated with a query item even if you don't bother to read the actual data tag value. The notion of different data tag categories also scales very well as we bring on new data partners and additional enrichment data.
Once we deployed the new data tag model, we started to see opportunity all over the place. One of the first changes we made was the addition of tags for items that had potential malware. Having the tab was helpful, but seeing a red data tag with an alert icon commonly associated with malicious code was invaluable.
Another place we found a use for the new data tag design was in IP addresses. Due to how we process passive DNS sources, we won't always know what's on the other end of a mouse click. It's quite possible that your next pivot to an IP address could land you on top of a hosting provider parking page. The result? A lot of waiting and little value. We decided to auto-process AS names and use them as tags, so if you are viewing a list of IP addresses in a table, you can start to prioritize your next pivot.
Last week, we announced the analyst assist feature for enterprise customers which also leverages tags. Have a list of known bad certificate hashes or registrant emails? Converting them to a data tag is as easy as deploying a signature. Using analyst assist, users can easily associate tags based on data points tucked away in tabs or overlooked in the HTML table. This not only groups items of interest, it allows a user to search based on the data tag, while also providing a visual cue that something of interest is associated with the query.
Whats next for tags?
Steve and I view tags as a critical component to PassiveTotal. Over the coming months, we plan to start exposing more tags through our API to match the web interface. We are also actively seeking additional data partners to fill out more tabs and add to our existing data tag categories. Have a dataset, product or an idea for a new set of categories or values? Let us know by sending a message to firstname.lastname@example.org.