Can you still Trust your Single Source of Truth?

The torrid pace of Digital Transformation is stretching Enterprise Resource Planning (ERP) financial systems in uncharted directions. Business opportunities to address modern customer requirements also carry new Security, Governance and Compliance risks. The distribution of operational data across many SaaS, PaaS & IaaS Clouds, recently merged/acquired companies, suppliers, distributors, channel partners and end customer Systems of Record, is a new existential threat. Finance teams and other business leaders running their departments with this fragmented operational data, have realized the Single Source of Truth concept born in the Client/Server era is now breaking at the seams.

Data Integrity has been a business challenge from the earliest days of general ledgers. The struggle to maintain data accuracy, consistency and normalization is compounded by new global privacy and regulatory compliance requirements around custodianship and transparency. Finally the millions of cybersecurity attack probes each Global 2000 organization faces daily, requires a fundamentally new level of Data Integrity urgency and focus.

(Steve Schmidt, AWS CISO re:Inforce Cloud Security Keynote, 6/25/2019)

(Steve Schmidt, AWS CISO re:Inforce Cloud Security Keynote, 6/25/2019)

Proactive Reconciliation

Traditional reconciliation of financial records, manufacturing inventory & supplier component status, asset utilization and related operational systems, can no longer afford to be a manual retroactive process performed over an elongated cadence. Strong, real-time assertion of data integrity at the transactional level, coupled with strong real-time verification of data integrity at the reporting level, is the new reality of business.

Data Sources, Sinks & Chains of Custody.png

Verified Shared State

Organizational structure and budgeting standards mean data silos are here to stay. Master Data Management and Data Federation are best-practices for leveraging the strategic advantage of valuable business data spread across many teams. Verified Shared State is the technical prerequisite in support of this business goal. Chains of custody between all data sources and data sinks in an organization are the mechanisms to deliver these requirements and return mathematically verifiable trust and integrity back to a modern relevant Single Source of Truth.

Chainkit builds Chains of Custody (wide. 2chains).png

PencilDATA’s Chainkit ‘Chain of Custody as a Service’ is the Enterprise-ready and Developer-friendly global service platform to enhance existing best-of-breed IT infrastructure and App/Data stacks. With Chainkit, new Digital Transformation projects don’t have to sacrifice Security, Risk and Governance requirements to meet ambitious business goals. Chainkit puts the trust back in your new Single Source of Truth.

Facebook Libra - opportunity you didn't read about at launch

(Prologue - I am a public Facebook skeptic due to their numerous privacy and security shortcomings. See my #DataResponsibility blog. However I am encouraged by the steps Facebook’s Calibra Team outlined in the design of their first token offering)

The highly anticipated announcement (not launch) of "Facebook's Blockchain" focused on the many sides of its payment features. Serving the under-banked, removing payment friction for those with credit, tackling global crypto governance, domestic & international regulatory tangents, will all dominate Libra discussion for the foreseeable future. But those topics merely hint at its potential for both good & evil. This key paper of the Libra announcement indicates where the hidden majority of opportunity & risk actually lies.


Facebook's Libra will provide the first global programmatic trust connection between 2.3 billion consumers and > 90M vendors / service providers, as well as governments. That’s a potential trust network at unprecedented scale. Notably, payment is merely part of a customer - provider lifecycle. Proposals & quotes, electronic tracking & delivery, product support, professional services, artifacts / certificates / licenses and digitized assets are but a few elements of the expanding digital value chain which fill out this lifecycle. Election advertising transparency and newsfeed content authenticity (in the age of Fake News and AI Deep Fakes) are some social good use-cases also enabled by Libra. But risks abound. 

TechCrunch's Josh Constine has already outlined some obvious risks here for Libra to mitigateTrust, integrity and transparency are now widely accepted baseline requirements for every digital transaction. Chains of Custody are the Security, Legal and Regulatory mechanisms by which to mitigate those risks and deliver on Libra's potential. 

Non-repudiation of goods & service delivery, tracking resource usage, attesting to the privacy & integrity of patient records, monitoring home & auto policy terms, facilitating travel logistics and interacting with government services (DMV, Taxes, ...) are basic examples of digital transactions which complement payments, yet need every bit as much of the security and privacy guarantees as the payments themselves.

I'm professionally excited by Facebook's Calibra Engineering team and their practical (i.e. evolving permissioning, centralization vs decentralization, etc...) design trade-offs published in the first Libra whitepaper. The team will no doubt iterate and improve many aspects of Libra based on community feedback, but they are already paving the way for a new global economy with boundless digital value exchange and related investment opportunity.

Data Responsibility: An Open Letter To The Tech Industry

2017 will be remembered as the End of our Data Innocence.

We saw next-level data breaches (Equifax, Yahoo!, SEC, Uber, etc), the #FakeNews epidemic, political weaponization of Social Media, and the recurring threats (both hyperbolic and very real)  on the hazards of unchecked AI.

Data-related events are escalating in public visibility and impact, and pose one of the greatest threats to the advancement of the tech industry that we’ve ever seen. And while it’s easy to sit back and blame criminals, rogue nations or other bad actors, the time for being passive is behind us.

Data Responsibility must become a priority for our industry.

Every person, organization, system, sensor, and intelligent machine that interacts with data has a responsibility for that data. It’s unacceptable that we’ve created the technology to trace a single organic banana every step of the way from farm to table, but we can’t tell you who touched a specific piece of data or where it’s been copied. In 1998 perhaps we could have argued this was too difficult to solve, but it’s now 2018 and our industry simply cannot continue to make excuses. Just this week, Ginni Rometty (Chairman, President, and CEO) of IBM made strong statements in support of data responsibility, a move we hope to see from other CEOs and industry leaders in 2018.

Data becomes “too valuable to use”

We will not solve this problem by pointing fingers at security, or infrastructure, or data teams. Architects, DBAs, Developers, Analysts, business teams, and any person or system involved in the end-to-end data pipeline all bear responsibility for acknowledging their roles in Data Responsibility. Just like the people and companies that fertilize, pick, transport, or sell the bananas take responsibility for their harvest, so must we.

Recognizing the Data Responsibility Problem

We created PencilDATA last year to solve the problem of realizing data value. Today, the more valuable a specific dataset becomes, the more protected and less accessible it becomes to the teams that could put it to the best use. These valuable datasets get locked down by regulations, contracts  and IP protections so that access becomes severely restricted. Projects like training a hungry machine learning model are delayed or derailed because the data science team just can’t get access to the right training data — it’s effectively been deemed “too valuable to use”.

How does that model make any business sense? What we came to realize in talking to early customers and partners is that it’s easier (and safer) to not use the data at all than it is to risk a data breach or bad news headline. That isn’t a solution, and is exactly the problem we’re solving at PencilDATA.

The problem, put simply, is that we don’t know where our bananas are.

The lack of an end-to-end data responsibility toolchain means that data owners are deprived of capturing the true value of their data, and also means that criminals and other bad actors have plenty of opportunity for unchecked access — something we saw taken to a new level in 2017.

Unlike the banana industry, we haven’t implemented a trusted model for tracing the end-to-end lifecycle of data. Instead, we rely on dozens of individual and disconnected systems, people, and processes that by all measures have room for improvement. The problem, put simply, is that we don’t know where our bananas are.

PencilDATA is solving the ‘last mile’ problem of getting valuable data in to the hands of the teams that need it most, but what about the rest of the lifecycle of that data? What happens before that data was “valuable” (or after)? It’s those times when data is most vulnerable, because organizations focus their energy on protecting their most valuable data, which might only represent 5% of the total data they own. Who is responsible for the other 95%?

Solving The Data Responsibility Problem – Our Proposal

We propose an inclusive, community-based open source initiative to define, build, and maintain a pragmatic framework for responsibly handling datasets throughout their lifecycle. We need at least the same level of end-to-end visibility for our datasets that’s afforded to a bunch of organic bananas.

The good news is that we’re not starting from scratch. Many open source projects exist today which if orchestrated around data responsibility can come together and begin to solve this problem. Some examples include:

In addition, there’s a long list of vendors including (but certainly not limited to!) AWS, Cloudera, Dell, Docker, Google, IBM, Hortonworks, HP, Informatica, MapR, Microsoft, Oracle, Salesforce, et al that already provide many of the individual components and technologies in use across the industry that can help support the goal of end-to-end Data Responsibility. We welcome their valuable contributions and collaboration with us to solve this industry-wide problem.

[*Exciting new technologies like Blockchain alone won’t magically solve the problem of end-to-end data responsibility, but it does bring an accessible, transparent and immutable level of proof that’s required if we’re going to gain public trust in the results of our efforts.]

Next Steps

Join us to take the first steps together this year on an open source specification around metadata, formats, and implementation for Data Responsibility, starting with an oversight committee and governing board made up of the initiative’s sponsors.

Find us on Twitter to get involved: @Pencil_DATA or @valb00