Archive | API RSS feed for this section

How New Chat Platforms Can Be Abused by Cybercriminals

7 Jun

Chat platforms such as Discord, Slack, and Telegram have become quite popular as office communication tools, with all three of the aforementioned examples, in particular, enjoying healthy patronage from businesses and organizations all over the world. One big reason for this is that these chat platforms allow their users to integrate their apps onto the platforms themselves through the use of their APIs. This factor, when applied to a work environment, cuts down on the time spent switching from app to app, thus resulting in a streamlined workflow and in increased efficiency. But one thing must be asked, especially with regard to that kind of feature: Can it be abused by cybercriminals? After all, we have seen many instances where legitimate services and applications are used to facilitate malicious cybercriminal efforts in one way or another, with IRC being one of the bigger examples, used by many cybercriminals in the past as command-and-control (C&C) infrastructure for botnets.

Turning Chat Platform APIs Into Command & Control Infrastructure

Our research has focused on analyzing whether these chat platforms APIs can be turned into C&Cs and to see whether there is existing malware that exploits that. Through extensive monitoring, research, and creation of proof-of-concept code, we have been able to demonstrate that each chat platform’s API functionality can successfully be abused – turning the chat platforms into C&C servers that cybercriminals can use to make contact with infected or compromised systems.

API-abusing Malware Samples Found

Our extensive monitoring of the chat platforms has also revealed that cybercriminals are already abusing these chat platforms for malicious purposes. In Discord, we have found many instances of malware being hosted, including file injectors and even bitcoin miners. Telegram, meanwhile, has been found to be abused by certain variants of KillDisk as well as TeleCrypt, a strain of ransomware. As for Slack, we have not yet found any sign of malicious activity in the chat platform itself at the time of this writing.

What makes this particular security issue something for businesses to take note of is that there is currently no way to secure chat platforms from it without killing their functionality. Blocking the APIs of these chat platforms means rendering them useless, while monitoring network traffic for suspicious Discord/Slack/Telegram connections is practically futile as there is no discernible difference between those initiated by malware and those initiated by the user.

With this conundrum in mind, should businesses avoid these chat platforms entirely? The answer lies in businesses’ current state of security. If the network/endpoint security of a business using a chat platform is up to date, and the employees within that business keep to safe usage practices, then perhaps the potential risk may be worth the convenience and efficiency.

Best Practices for Users

  • Keep communications and credentials confidential. Do not reveal or share them with anyone else.
  • Never click on suspicious links, even those sent from your contacts.
  • Never download any suspicious files, even those sent from your contacts.
  • Comply rigorously with safe surfing or system usage habits.
  • Never use your chat service account for anything other than work purposes.
  • Chat traffic should be considered as no more “fully legitimate” than web traffic – you need to decide how to monitor it, limit it, or drop it completely.

Best Practices for Businesses

  • Enforce strict guidelines and safe usage habits among employees.
  • Inform employees and officers on typical cybercriminal scams, such as phishing scams and spam.
  • Ensure that IT personnel are briefed and educated about the threats that may arise from usage of chat platforms, and have them monitor for suspicious network activity.
  • Assess if the use of a chat platform is really that critical to day-to-day operations. If not, discontinue use immediately.

The complete technical details of our research can be found in our latest paper How Cybercriminals Can Abuse Chat Program APIs as Command-and-Control Infrastructures.   download: wp-how-cybercriminals-can-abuse-chat-platform-apis-as-cnc-infrastructures

Chat platform APIs abuse


The spectacles of a web server log file

14 Feb

Web server log files exist for more than 20 years. All web servers of all kinds, from all vendors, since the time NCSA httpd was powering the web, produce log files, saving in real-time all accesses to web sites and APIs.

Yet, after the appearance of google analytics and similar services, and the recent rise of APM (Application Performance Monitoring) with sophisticated time-series databases that collect and analyze metrics at the application level, all these web server log files are mostly just filling our disks, rotated every night without any use whatsoever.

This is about to change!

I will show you how you can turn this “useless” log file, into a powerful performance and health monitoring tool, capable of detecting, in real-time, most common web server problems, such as:

  • too many redirects (i.e. oops! this should not redirect clients to itself)
  • too many bad requests (i.e. oops! a few files were not uploaded)
  • too many internal server errors (i.e. oops! this release crashes too much)
  • unreasonably too many requests (i.e. oops! we are under attack)
  • unreasonably few requests (i.e. oops! call the network guys)
  • unreasonably slow responses (i.e. oops! the database is slow again)
  • too few successful responses (i.e. oops! help us God!)

install netdata

If you haven’t already, it is probably now a good time to install netdata.

netdata is a performance and health monitoring system for Linux, FreeBSD and MacOS. netdata is real-time, meaning that everything it does is per second, so all the information presented, is just a second behind.

If you install it on a system running a web server it will detect it and it will automatically present a series of charts, with information obtained from the web server API, like these (these do not come from the web server log file):

image[netdata]( charts based on metrics collected by querying the nginx API (i.e. /stab_status).

netdata supports apache, nginx, lighttpd and tomcat. To obtain real-time information from a web server API, the web server needs to expose it. For directions on configuring your web server, check /etc/netdata/python.d/. There is a file there for each web server.

tail the log!

netdata has a powerful web_log plugin, capable of incrementally parsing any number of web server log files. This plugin is automatically started with netdata and comes, pre-configured, for finding web server log files on popular distributions. Its configuration is at /etc/netdata/python.d/web_log.conf, like this:

nginx_netdata:                        # name the charts
  path: '/var/log/nginx/access.log'   # web server log file

You can add one such section, for each of your web server log files.

Keep in mind netdata runs as user netdata. So, make sure user netdata has access to the logs directory and can read the log file.

chart the log!

Once you have all log files configured and netdata restarted, for each log file you will get a section at the netdata dashboard, with the following charts.

responses by status

In this chart we tried to provide a meaningful status for all responses. So:

  • success counts all the valid responses (i.e. 1xx informational, 2xx successful and 304 not modified).
  • error are 5xx internal server errors. These are very bad, they mean your web site or API is facing difficulties.
  • redirect are 3xx responses, except 304. All 3xx are redirects, but 304 means “not modified” – it tells the browsers the content they already have is still valid and can be used as-is. So, we decided to account it as a successful response.
  • bad are bad requests that cannot be served.
  • other as all the other, non-standard, types of responses.


responses by type

Then, we group all responses by code family, without interpreting their meaning.


responses by code

And here we show all the response codes in detail.


If your application is using hundreds of non-standard response codes, your browser may become slow while viewing this chart, so we have added a configuration option to disable this chart.


This is a nice view of the traffic the web server is receiving and is sending.

What is important to know for this chart, is that the bandwidth used for each request and response is accounted at the time the log is written. Since netdata refreshes this chart every single second, you may have unrealistic spikes is the size of the requests or responses is too big. The reason is simple: a response may have needed 1 minute to be completed, but all the bandwidth used during that minute for the specific response will be accounted at the second the log line is written.

As the legend on the chart suggests, you can use FireQoS to setup QoS on the web server ports and IPs to accurately measure the bandwidth the web server is using. Actually, there may be a few more reasons to install QoS on your servers


Most web servers do not log the request size by default.
So, unless you have configured your web server to log the size of requests, the receiveddimension will be always zero.


netdata will also render the minimum, average and maximum time the web server needed to respond to requests.

Keep in mind most web servers timings start at the reception of the full request, until the dispatch of the last byte of the response. So, they include network latencies of responses, but they do not include network latencies of requests.


Most web servers do not log timing information by default.
So, unless you have configured your web server to also log timings, this chart will not exist.

URL patterns

This is a very interesting chart. It is configured entirely by you.

netdata can map the URLs found in the log file into categories. You can define these categories, by providing names and regular expressions in web_log.conf.

So, this configuration:

nginx_netdata:                        # name the charts
  path: '/var/log/nginx/access.log'   # web server log file
    badges      : '^/api/v1/badge\.svg'
    charts      : '^/api/v1/(data|chart|charts)'
    registry    : '^/api/v1/registry'
    alarms      : '^/api/v1/alarm'
    allmetrics  : '^/api/v1/allmetrics'
    api_other   : '^/api/'
    netdata_conf: '^/netdata.conf'
    api_old     : '^/(data|datasource|graph|list|all\.json)'

Produces the following chart. The categories section is matched in the order given. So, pay attention to the order you give your patterns.


HTTP methods

This chart breaks down requests by HTTP method used.


IP versions

This one provides requests per IP version used by the clients (IPv4, IPv6).


Unique clients

The last charts are about the unique IPs accessing your web server.

This one counts the unique IPs for each data collection iteration (i.e. unique clients per second).


And this one, counts the unique IPs, since the last netdata restart.


To provide this information web_log plugin keeps in memory all the IPs seen by the web server. Although this does not require so much memory, if you have a web server with several million unique client IPs, we suggest to disable this chart.

real-time alarms from the log!

The magic of netdata is that all metrics are collected per second, and all metrics can be used or correlated to provide real-time alarms. Out of the box, netdata automatically attaches the following alarms to all web_log charts (i.e. to all log files configured, individually):

alarm description minimum
warning critical
1m_redirects The ratio of HTTP redirects (3xx except 304) over all the requests, during the last minute.

Detects if the site or the web API is suffering from too many or circular redirects.

(i.e. oops! this should not redirect clients to itself)

120/min > 20% > 30%
1m_bad_requests The ratio of HTTP bad requests (4xx) over all the requests, during the last minute.

Detects if the site or the web API is receiving too many bad requests, including 404, not found.

(i.e. oops! a few files were not uploaded)

120/min > 30% > 50%
1m_internal_errors The ratio of HTTP internal server errors (5xx), over all the requests, during the last minute.

Detects if the site is facing difficulties to serve requests.

(i.e. oops! this release crashes too much)

120/min > 2% > 5%
5m_requests_ratio The percentage of successful web requests of the last 5 minutes, compared with the previous 5 minutes.

Detects if the site or the web API is suddenly getting too many or too few requests.

(i.e. too many = oops! we are under attack)
(i.e. too few = oops! call the network guys)

120/5min > double or < half > 4x or < 1/4x
web_slow The average time to respond to requests, over the last 1 minute, compared to the average of last 10 minutes.

Detects if the site or the web API is suddenly a lot slower.

(i.e. oops! the database is slow again)

120/min > 2x > 4x
1m_successful The ratio of successful HTTP responses (1xx, 2xx, 304) over all the requests, during the last minute.

Detects if the site or the web API is performing within limits.

(i.e. oops! help us God!)

120/min < 85% < 75%

The column minimum requests state the minimum number of requests required for the alarm to be evaluated. We found that when the site is receiving requests above this rate, these alarms are pretty accurate (i.e. no false-positives).

netdata alarms are user configurable. So, even web_log alarms can be adapted to your needs.



Cisco Sets Digital Network Architecture as its Platform of the Future

3 Mar

Cisco unveiled its Digital Network Architecture (DNA) for transforming business with the power of analytics driven by programmable networks, cloud applications, open APIs, and virtualization.  The Cisco DNA aims to extend the company’s data center-based, policy-driven Application Centric Infrastructure (ACI) technology throughout the entire network: from campus to branch, wired to wireless, core to edge.

Cisco DNA is built on five guiding principles:

  • Virtualize everything to give organizations freedom of choice to run any service anywhere, independent of the underlying platform – physical or virtual, on premise or in the cloud.
  • Designed for automation to make networks and services on those networks easy to deploy, manage and maintain – fundamentally changing the approach to network management.
  • Pervasive analytics to provide insights on the operation of the network, IT infrastructure and the business – information that only the network can provide.
  • Service management delivered from the cloud to unify policy and orchestration across the network – enabling the agility of cloud with the security and control of on premises solutions.
  • Open, extensible and programmable at every layer – Integrating Cisco and 3rd party technology, open API’s and a developer platform, to support a rich ecosystem of network-enabled applications.

“The digital network is the platform for digital business,” said Rob Soderbery, SVP for Enterprise Products and Solutions, Cisco.  “Cisco DNA brings together virtualization, automation, analytics, cloud and programmability to build that platform.  The acronym for the Digital Networking Architecture – DNA – isn’t an accident. We’re fundamentally changing the DNA of networking technology.”

The first deliverables of Cisco DNA include:

DNA Automation:  APIC-Enterprise Module (APIC EM) Platform

  • APIC-EM Platform:  A new version of Cisco’s enterprise controller has been released. Cisco claims 100+ customer deployments running up to 4000 devices from a single instance.  The company is adding automation software that removes the need for staging for pre-configuration or truck roll-outs to remote locations. The Plug and Play agent sits on Cisco routers and switches and talks directly to the network controller. A new EasyQoS service enables the network to dynamically update network wide QoS settings based on application policy.
  • Cisco Intelligent WAN Automation Services: This service automates IWAN deployment and management, providing greater WAN deployment flexibility and allowing IT to quickly configure and deploy a full-service branch office with just 10 clicks.  IWAN automation eliminates configuration tasks for advanced networking features, and automatically enables Cisco best practices, application prioritization, path selection and caching to improve the user experience.
  • DNA Virtualization:  Evolved IOS-XE is a network operating system optimized for programmability, controller-based automation, and serviceability. The new OS provides open model-driven APIs for third party application development, software-defined management, application hosting, edge computing and abstraction from the physical infrastructure to enable virtualization.   It supports the Cisco Catalyst 3850/3650, ASR 1000 and ISR 4000 today, and will continue to be expanded across the Enterprise Network portfolio.

    Evolved Cisco IOS XE includes Enterprise Network Function Virtualization (Enterprise NFV) that decouples hardware from software and gives enterprises the freedom of choice to run any feature anywhere. This solution includes the full software stack – virtualization infrastructure software; virtualized network functions (VNFs) like routing, firewall, WAN Optimization, and WLAN Controller; and orchestration services – to enable branch office service virtualization.

  • DNA Cloud Service Management:  CMX Cloud provides business insights and personalized engagement using location and presence information from Cisco wireless infrastructure.  With CMX Cloud enterprises can provide easy Wi-Fi onboarding, gain access to aggregate customer behavior data, and improve customer engagement.

Why Storage-As-A-Service Is The Future Of IT

1 Apr

Selecting the right storage hardware can often be a no-win proposition for the IT professional. The endless cycle of storage tech refreshes and capacity upgrades puts IT planners and their administrators into an infinite loop of assessing and re-assessing their storage infrastructure requirements. Beyond the capital and operational costs and risks of buying and implementing new gear are also lost opportunity costs. After all, if IT is focused on storage management activities, they’re not squarely focused on business revenue generating activities. To break free from this vicious cycle, storage needs to be consumed like a utility.


Virtualization technology has contributed to the commoditization of server computational power as server resources can now be acquired and allocated relatively effortlessly, on-demand both in the data center and in the cloud. The four walls of the data center environment are starting to blur as hybrid cloud computing enables businesses to burst application workloads anywhere at anytime to meet demand. In short, server resources have effectively become a utility.

Likewise, dedicated storage infrastructure silos also need to break down to enable businesses to move more nimbly in an increasingly competitive global marketplace. Often, excess storage capacity is purchased to hedge against the possibility that application data will grow well beyond expectations. This tends to result in underutilized capacity and a higher total cost of storage ownership. The old ways of procuring, implementing and managing storage simply do not mesh with business time-to-market and cost-cutting efficiency objectives.

In fact, the sheer volume of “software-defined” (storage, network or data center) technologies is a clear example of how the industry is moving away from infrastructure silos in favor of a commoditized pool of centrally managed resources, whether they be CPU, network or storage, that deliver greater automation.

On-Demand Commoditization

Storage is also becoming increasingly commoditized. With a credit card, storage can be instantaneously provisioned by any one of a large number of cloud service providers (CSPs). Moreover, many of the past barriers for accessing these storage resources, like the need to re-code applications with a CSPs API (application programming interface), can be quickly addressed through the deployment of a cloud gateway appliance.

These solutions make it simple for businesses to utilize cloud storage by providing a NAS front-end to connect existing applications with cloud storage on the back-end. All the necessary cloud APIs, like Amazon’s S3 API for example, are embedded within the appliance; obviating the need to re-code existing applications.

Hybrid Powered QoS

But while organizations are interested in increasing their agility and reducing costs, they may still be leery of utilizing cloud storage capacity. After all, how can you ensure that the quality-of-service in the cloud will be as good as local storage?

Interestingly, cloud gateway technologies allow businesses to implement a hybrid solution where local, high performance solid-state-disk (SSD) configured on an appliance is reserved for “hot” active data sets, while inactive data sets are seamlessly migrated to low-cost cloud storage for offsite protection. This provides organizations with the best of both worlds and with competition intensifying between CSPs, companies can benefit from even lower cloud storage costs as CSPs vie for their business.

Cloud Covered Resiliency

Furthermore, by consuming storage-as-a-service (SaaS) through a cloud gateway appliance, businesses obtain near instant offsite capabilities without making a large capital outlay for dedicated DR data center infrastructure. If data in the primary location gets corrupted or somehow becomes unavailable, information can simply be retrieved directly from the cloud through a cloud gateway appliance at the DR location.

Some cloud storage technologies combine storage, backup and DR into a single solution and thus eliminate the need for IT organizations to conduct nightly backups or to do data replication. Instead, businesses can store unlimited data snapshots across multiple geographies to dramatically enhance data resiliency. This spares IT personnel from the otherwise tedious and time consuming tasks of protecting data when storage assets are managed in-house. SaaS solutions offer a way out of this conundrum by effectively shrink-wrapping storage protection as part of the native offering.

SaaS Enabled Cloud

What’s more, once the data is stored in the cloud, it can potentially be used for bursting application workloads into the CSPs facility. This can help augment internal data center server resources during peak seasonal business activity and/or it can be utilized to improve business recovery time objectives (RTOs) for mission critical business applications. In either case, these are additional strong use cases for leveraging SaaS technology to further enable an organization’s cloud strategy.

Cloud Lock-In Jailbreak

One area of concern for businesses, however, is cloud vendor “lock-in” and/or the long-term business viability of some cloud providers. The Nirvanix shutdown, for example, caught Nirvanix’ customers, as well as many industry experts offguard; this was a well funded CSP that had backing by several large IT industry firms. The ensuing scramble to migrate data out of the Nirvanixdata centers before they shut their doors was a harrowing experience for many of their clients, so this is clearly a justifiable concern.

Interestingly, SaaS suppliers like Nasuni, can rapidly migrate customer data out of a CSP data center and back to the customers premises or alternatively, to a secondary CSP site when needed. Since they maintain the necessary bandwidth connections to CSPs and between CSP sites, they can readily move data en masse when the need arises. In short, Nasuni’s offering can help insulate customers from being completely isolated from their data, even in the worst of circumstances. As importantly, these capabilities help protect businesses from being locked-in to a single provider as data can be easily ported to a competing CSP on-demand.

Cloud Lock-In Jailbreak

To prevent a business from being impacted by another unexpected cloud shutdown, SaaS solutions can be configured to mirror business data across two different CSPs for redundancy, to help mitigate the risk of a cloud provider outage. While relatively rare, cloud outages do occur, so if a business cannot tolerate any loss of access to their cloud stored data, this is a viable option.

SaaS providers like Nasuni, can actually offset some of the costs associated with mirroring across CSPs since they function, in effect, like a cloud storage aggregator. Simply put, since they buy cloud storage capacity in large volumes, they can often obtain much better rates than if customers tried negotiating directly with the CSPs themselves.


Managing IT infrastructure (especially storage) is simply not a core function for many businesses. The endless loop of evaluating storage solutions, going through the procurement process, decommissioning older systems and implementing newer technologies, along with all the daily care and feeding, does not add to the business bottom line. While storage is an essential resource, it is now available as a service, via the cloud at a much lower total cost of ownership.

Like infrastructure virtualization, SaaS is the wave of the future. It delivers a utility like storage service that is based on the real-time demands of the business. No longer does storage have to be over provisioned and under-utilized. Instead, like a true utility, businesses only pay for what they consume – not what they think they might consume some day in the future.

SaaS solutions can deliver the local high speed performance businesses need for their critical application infrastructure, while still enabling them to leverage the economies of scale of low-cost cloud storage capacity.

Furthermore, Nasuni’s offering allows organizations to build in the exact amount of data resiliency their business requires. Data can be stored with a single CSP or mirrored across multiple CSPs for redundancy or for extended geographical reach. The combined attributes of the offering allows business needs to be met while enabling IT to move on to bigger and better things.



The real promise of big data: It’s changing the whole way humans will solve problems

10 Feb

The real promise of big data: It’s changing the whole way humans will solve problems

Current “big data” and “API-ification” trends can trace their roots to a definition Kant first coined in the 18th century. In his Critique of Pure Reason, Kant drew a dichotomy between analytic and synthetic truths.

An analytic truth was one that could be derived from a logical argument, given an underlying model or axiomatization of the objects the statement referred to. Given the rules of arithmetic we can say “2+2=4” without putting two of something next to two of something else and counting a total of four.

A synthetic truth, on the other hand, was a statement whose correctness could not be determined without access to empirical evidence or external data. Without empirical data, I can’t reason that adding five inbound links to my webpage will increase the number of unique visitors 32%.

In this vein, the rise of big data and the proliferation of programmatic interfaces to new fields and industries have shifted the manner in which we solve problems. Fundamentally, we’ve gone from creating novel analytic models and deducing new findings, to creating the infrastructure and capabilities to solve the same problems through synthetic means.

Until recently, we used analytical reasoning to drive scientific and technological advancements. Our emphasis was either 1) to create new axioms and models, or 2) to use pre-existing models to derive new statements and outcomes.

In mathematics, our greatest achievements were made when mathematicians had “aha!” moments that led to new axioms or new proofs derived from preexisting rules. In physics we focused on finding new laws, from which we derived new knowledge and knowhow. In computational sciences, we developed new models for computation from which we were able to derive new statements about the very nature of what was computable.

The relatively recent development of computer systems and networks has induced a shift from analytic to synthetic innovation.

For instance, how we seek to understand the “physics” of the web is very different from how we seek to understand the physics of quarks or strings. In web ranking, scientists don’t attempt to discover axioms on the connectivity of links and pages from which to then derive theorems for better search. Rather, they take a synthetic approach, collecting and synthesizing previous click streams and link data to predict what future users will want to see.

Likewise at Amazon, there are no “Laws of e-commerce” governing who buys what and how consumers act. Instead, we remove ourselves from the burden of fundamentally unearthing and understanding a structure (or even positing the existence of such a structure) and use data from previous events to optimize for future events.

Google and Amazon serve as early examples of the shift from analytic to synthetic problem solving because their products exist on top of data that exists in a digital medium. Everything from the creation of data, to the storage of data, and finally to the interfaces scientists use to interact with data are digitized and automated.

Early pioneers in data sciences and infrastructure developed high throughput and low latency architectures to distance themselves from hard-to-time “step function” driven analytic insights and instead produce gradual, but predictable synthetic innovation and insight.

Before we can apply synthetic methodologies to new fields, two infrastructural steps must occur:

1) the underlying data must exist in digital form and

2) the stack from the data to the scientist and back to the data must be automated.

That is, we must automate both the input and output processes.

Concerning the first, we’re currently seeing an aggressive pursuit of digitizing new datasets. An Innovation Endeavors’ company, Estimote, exemplifies this trend. Using Bluetooth 4.0, Estimote is now collecting user specific physical data in well-defined microenvironments. Applying this to commerce, they’re building Amazon-esque data for brick and mortar retailers.

Tangibly, we’re not far from a day when our smartphones automatically direct us, in store, to items we previously viewed online.

Similarly, every team in the NBA has adopted SportsVU cameras to track the location of each player (and the ball) microsecond by microsecond. With this we’re already seeing the collapse of previous analytic models. A friend, Muthu Alapagan, recently received press coverage when he questioned and deconstructed our assumption in positing five different position-types. What data did we have to back up our assumption that basketball was inherently structured with five player types? Where did these assumptions come from? How correct were they? Similarly, the Houston Rockets have put traditional ball control ideology to rest in successfully launching record numbers of three-point attempts.

Finally, in economics, we’re no longer relying on flawed traditional microeconomic axioms to deduce macroeconomic theories and predictions. Instead we’re seeing econometrics play an every increasing role in the practice and study of economics.

Tangentially, the recent surge in digital currencies can be seen as a corollary to this trend. In effect, Bitcoin might represent the early innings of an entirely digitized financial system where the base financial nuggets that we interact with exist fundamentally in digital form.

We’re seeing great emphasis not only in collecting new data, but also in storing and automating the actionability of this data. In the Valley we joke about how the term “big data” is loosely thrown around. It may make more sense to view “big data” not in terms of data size or database type, but rather as a necessary infrastructural evolution as we shift from analytic to synthetic problem solving.

Big data isn’t meaningful alone; rather it’s a byproduct and a means to an end as we change how we solve problems.

The re-emergence of BioTech, or BioTech 2.0, is a great example of innovation in automating procedures on top of newly procured datasets. Companies likeTranscriptic are making robotic fully automated wet labs while TeselaGen andGenome Compiler are providing CAD and CAM tools for biologists. We aren’t far from a day when biologists are fully removed from pipettes and traditional lab work. The next generation of biologists may well use programmatic interfaces and abstracted models as computational biology envelopes the entirety of biology  —  driving what has traditionally been an analytic truth seeking expedition to a high throughput low latency synthetic data science.

Fundamentally, we’re seeing a shift in how we approach problems. By removing ourselves from the intellectual and perhaps philosophical burden of positing structures and axioms, we no longer rely on step function driven analytical insights. Rather, we’re seeing widespread infrastructural adoption to accelerate the adoption of synthetic problem solving.

Traditionally these techniques were constrained to sub-domains of computer science – artificial intelligence and information retrieval come to mind as tangible examples – but as we digitize new data sets and build necessary automation on top of them, we can employ synthetic applications in entirely new fields.

Marc Andreessen famously argued, “Software is eating the world” in his 2011 essay. However, as we dig deeper and understand better the nature of software, APIs, and big data, it’s not software alone, but software combined with digital data sets and automated input and output mechanisms that will eat the world as data science, automation, and software join forces in transforming our problem solving capabilities – from analytic to synthetic.


Pragmatic Restfull API with ASP C#’S Web API

6 Jan

Recently I had to write an API to an existing ASP C# web application. It was an interesting experience of which I would love to share my experience with the hope that I might help a few and also get advised on some aspects.

When designing the API architecture; I had to make a choice on the framework (SOAP or REST), the message format to use (JSON or XML) and which of the two to use Web API or WCF.

After consulting and getting the green light from some of my more experienced workmates I chose to go with XML running over REST in Web API. After analyzing this comparison between Web API andWCF, I decided to go with Web API because it was purely designed with REST in mind and was ideal for what I intended to build.

Why Pragmatic REST
I have tried to read extensively about REST best practices from various sources but it seems there is no official or recognized REST best practices but instead most developers go with what works, is flexible, is robust and to a large extents meets Roy Fielding’s dissertation on REST.
I tried to conform to some of REST’s standards in my approach to the design while in some areas I slightly veered off.

Below is how I approached my small REST project, with emphasis on a few areas I found interesting.

Resource Definition
I used four resources (api/CustomerVerification), (api/TransactionStatus), (api/MakeTransaction), (api/ReverseTransaction all accepting only POST requests.
Each was in its own controller with a Post method ie public HttpResponseMessage Post(HttpRequestMessage request).

Data handling
Initially I tried out parameter binding where by an incoming request was bound to a corresponding model. Ie public HttpResponseMessage Post(HttpRequestMessage request, CustomerInfoRequest cust). Here the Post action expects the incoming XML message body to be deserialized to type CustomerInfoRequest . (The message format is set in the API specification contract)

public class CustomerInfoRequest{
     public string CustReference { get; set; }

To enable serializing and deserializing of incoming and outgoing requests from xml to the corresponding objects and vice versa, I took advantage of System.Runtime.Serialization features.
For example the CustomerInfoRequest model class used the System.Runtime.Serialization namespace to have its class decorated with a DataContract attribute and its members decorated with aDataMember attribute. The DataContract attribute allows the class to be serializable by  DataContractSerializer and the DataMember specifies that the member is part of a data contract and is serializable by DataContractSerializer.

[DataContract(Namespace = "")]
public class CustomerInfoRequest
     [DataMember(IsRequired = true)]
     public string CustReference { get; set; }

[DataContract(Namespace = “”)] allows for namespace definition, If you leave out the Namespace=””, a default namespace will be created by Web API matching the path to your model class which I think is ugly. Requests without this namespace will fail so unless there is a specific namespace to be used I preferred leaving the namespace empty (Namespace=””).

The issue I encountered with the above parameter binding method was that the DataContractSerializer cares a great deal about the element ordering. Elements have to be ordered in alphabetical order ie <name> should come after <address> or an exception will be thrown. I ditched parameter binding and instead opted to deserialize the XML request to the corresponding model class using the less emotionalXmlSerializer.

So our controller’s Post method now changed into:

Data validation
When creating the class models each of the properties’ was appropriately decorated with the required data annotations using System.ComponentModel.DataAnnotations class ie

[StringLength(51, ErrorMessage = "The {0} must be at least {2} characters long.", MinimumLength = 4)]
public string CustReference { get; set; }

The data restrictions are got from the API specification document created early on at project inception. All incoming requests are be checked to ensure the request message matches the required specification. This can be done by using the ValidationContext class to validate the request model against the specified data annotations on each of the model members

All requests are validated with the above method in the controller ie ParameterHelper.ValidateApiRequestData(custReqestInfo);

There are quite many ways on how to implement Rest API security and in my opinion there is no agreed right way of doing it at least for now. Hence different people have different ways of implementing API security. In my case I utilized both an API key and a hashed value sent as part of the request.


The API key uniquely identifies the requesting third party while the HashedValue is created by hashing a concatenation of a private key (provided to the third party) and a few other values like time using the SHA51 algorithm.
For some requests like (api/MakeTransaction) the HashedValue is unique to each request .
On top of the above, incoming requests are sent over https with IP blocking in place (only requests from recognizable IP addresses are allowed).
For now I think this is a bit secure but I could be wrong…….

Request & Response Logging
This was a bit tricky but reading this article greatly helped. Briefly, the blog post describes the use of a message handler to handle all incoming and outgoing requests. I also took advantage of Elmah’s error logging features.

Exception Handling
I created a master Exception class (ServiceException) through which formatted error output was sent back to the calling party.
All kinds of exceptions encountered in the application were caught and then thrown again as a custom exception of type ServiceException. This way all exceptions can be categorized and sent back with meaningful data

There is a final try catch in the controller class to catch all thrown ServiceException

 catch (ServiceException e)
 { return Request.CreateResponse(HttpStatusCode.OK, e.FormatResponse());}
 catch (Exception e) //to handle any unhandled exceptions in the application
      ServiceException pd = new ServiceException();
      pd.ExceptionMessage = e.Message;
      return Request.CreateResponse(HttpStatusCode.InternalServerError, pd.FormatUnhandledResponse());

This post is mostly rudimentary but hopefully something good will come out of it.


%d bloggers like this: