Wednesday, October 22, 2014

What qualifies a Project to be a Product?

Disclaimer:
This post is based upon a piece of dream (not fiction). Any relation to any enterprise grade product is purely coincidental.

---

Now, this is not a flame post against Proprietary Software. I'm a FOSS supporter but with understanding that for some Business... 
* paid 24x7 support is a lot more critical than quality
* need to put trust in a product where the Vendor is bound by agreement to help
which is perfectly fine.
Depending on a business requirement and policies, there need to be different solutions provided. Some community supported FOSS and then many large corporations (backed for years) Enterprise Software.

Again, no this post is not on license of software.
This post is on some core values of a software (FOSS or Proprietary) that make it eligible to be used at a worthy scale in a Organization that is depending on it.
These also stand true for any user of that Software Project, but are more crucial for users depending their economy on it.
And are ethically right of the Corporations which are paying big bucks for a piece of code sold to them on high brand and big promises.

Here on I use "Products" for all paid-for (some Enterprise grade) Projects that I got to work on since college and experience the truth underneath.
---

So... What qualifies a Project to be a Product?

  • 2 old-skool fundamental mantras: loose coupling; high cohesionThere are "Products" with bunch of modules that have separate responsibilities. Correct approach. But when you start to build around them, sometimes they are not that independent in containing self responsibilities.
    That's worse than not having your project modular at all, at least that way you promote users to treat it as a black-box.
    "Product" need to be well modular-ized emitting the age old beautiful development practices of keeping your modules loosely coupled, strongly cohesive.
  • isolated from client specific details
    Anything and everything that depicts details about client specific implementation need to be managed as a configuration provided explicitly. There shouldn't be any necessity of find-replace any hard-coded "config text" in source or set-up files for the "Product". These details shan't be required to be packaged with custom traditional set-up of "Product".
  • generic out-of-the-box setup... tested and isolated
    The "Product" installer binaries shall be self-contained. They can be O.S. distribution specific, which is fine.
    They should be unaffected of O.S. level restrictions that may be there or not. Say if SELinux is enabled, your setup shall be able to initiate changes required for the "Product" to function.
    Any dependency shall be either bundled in the installer package, or shall utilize the targeted package manager to lay it down for itself.
    The installer shall be tested completely in the mode it is supposed to be used. For example, if it sets up the machine remotely then it shall be capable of handling all tasks remotely by itself and not depend on user to place some files for it first or in-between.
  • no mesh between services required for initiation
    The "Product" might have distributed architecture support, might have modular components collaborating.
    Now these distributed components need to be aware of each other to be able to collaborate. The components shall be robust enough to handle unavailable dependency components and regain activity once available.
    This awareness can be maintained on a component dedicated to dynamic configuration query and update, where all component instances export details about self and gather information on others. Environment preparation mechanism can populate the initial dynamic information which on requirement can be updated in collaboration-enabling component and gathered by others.
    Every component can persist information required for it within itself as well, but that shouldn't be dependent on the other components. If component-F collaborates with component-A and requires component-A to mark some activities for it even to start itself. Bad design.
  • don't promote deprecated technology in newer component
    Some "Product" have big lifetime, depending on their vastness &/or critical nature. This might lead to existence of certain obsolete technologies (like SOAP in yr2014) usage in newer components.
    You shouldn't start rebuilding entire perfectly working "Product" for that, yes. But neither shall you use that as an excuse for holding back entire new development around it. It will cripple the "Product" much faster of surviving in newer circumstances if it can't use the power required for them.
    Build a mid-layer contract API and seclude your new work over your legacy "Product". Then build new features over that mid-layer API.
    This will help you avoid "Product" becoming a bloatware just because new features can't be used inherently from the design itself. It will also help contain the sanity of perfectly working "Product" from some (library, etc.) changes required for new features. And will let you use more advance current best practices for all the newer work, not cripple it.
    If not "Product" in entirety; at least every individual component in itself shall follow unified design/development/interface/platform strategy... it shouldn't have bifurcation of ideals, if necessary a new component shall be carved out and plugged-in instead of corrupting the existing piece.
  • different style of code, different degree of documentation
    If your code-base is not very huge, beautifully written clean and modular code can survive without documentation.
    If your code is not clean (don't judge it yourself, ask someone expert in language/framework but unaware of logic to guess) then have freaking documentation all over it.
    If your code-base is "really" huge, even if your code is mostly clean... have a basic module level documentation at least.
    So anyone using it knows how to handle them and build upon or around them. Anyone visiting the code-base many years after (if you think your "Product" is capable) with lot improved language features is still able to make sense of what was carved with stone on cave walls.
  • building upon/around it shouldn't involve hack-y ways
    In small home-brewed solutions, unpack-replace-repack is still arguably accepted solution.
    In a freaking "Product", it shall expose an elegant yet secure API interface to extend features. If it allows overriding of existing features, even they shall be plugged in via that interface causing the built-in feature to be subdued.
    It's not effortless but is a sane and secure method. The "Product" need this embedded in it's design.
  • for all kinds of data-set involved, should have a proper data-modification strategy
    Hand modification seem harmless at development stage, even at testing sometimes. In production if any kind of data modification done during setup or updates doesn't have a migration/rollback strategy around it, it's a danger zone signal.
    Like proper DB migration scripts over (and not plain diffs of) current version of scripts. I might seem obvious as all other points, but there are bunch of Product missing one or other variations of it.
  • take care of sensitive data involved even during setup of "Product"
    If your Product in any manner forces display or storage of sensitive data (usernames, passwords, machine names, network details, yada yada yada), danger zone again.
  • have at least some level of sanity tests for all logic flows
    Even in this age of software development, if importance of software testing (acceptance, regression, security) need to be advocated to you, it's disaster.
    To state obvious... any change can be tracked properly, all technologists have some level of idea and trust on what's where doing what, people picking it much later can build over it without breaking anything.


and some things, but that for some other time.....

---

LONG QUESTION:
Could a piece of software sold by "so called tech moghul" Organizations be so crappy that the only purpose it fills is vendor lock-in for years of licensing and not even the creators know it's buggy to its core?
SHORT ANSWER:
Yes! So chose with care.

Thursday, September 18, 2014

base checklist: 10 points to decide if chose an OSS for Production or not

One question you get from skeptics (who are actually really important for quality check) while discussion on picking an OpenSource solution instead of a support attached closed source one. Which is how to trust it to be safe for Production release.

The question actually even suits which OSS to pick, when there are several.

Question we are trying to answer here is... How to pick an OpenSource solution that will live long and prosper, not turn into a rot that smells on any change in Production on updates over period of time.

First just to mention again that almost every Technologist already understands. There is no guarantee just "ensured" support over closed source software to guarantee it's safety or supporting future technical growth. I don't wanna dwell into the dangers that it brings in, 'cuz this is not the post for that. That's entirely other exhausted list.

So on what to see in OpenSource software that helps you decide it is trusted to be included for production release...


While weighing in for inclusion of any big or small OpenSource utility into your Production list, following checklist shall help:

1*) OSS have Licenses too
First of all check if their License suits inclusion with licensing of your project. Example, People have been seeking ways (somewhat succeded) to get ZFS on GNU/Linux.

2*) Is project active "enough"
Second quick check is seeing if project has been inactive for a dangerous period. Now for every kind of project, a dangerous period differs widely. Would have to depend on better judgment of self and trusted community you know. Like for a library providing certain algorithm, post stable release changes would be a lot slower. But for a webdev framework, with current tradition... it'll be popping new minor releases now and then.

Now few things for which you'll need to read around a little....
Sources to recon about following attributes: Mailing lists, Issue boards, IRC, Twitter streams, may be others depending on project

3*) How much active and inclusive is its community
How well do they handle PullRequests and Issues raised on their project. This includes the readiness on response and adapting a better direction, both but mainly former.
How well they handle risks and vulnerabilities reported, if any. Quickest patch is not the main measure, most important is accepting it and providing a workaround till main issue gets resolved.

4*) Good core team matters (they need not be very popular)
Check who forms core team maintaining that OSS. Some other projects of their, even if not popular would give you an idea on how much and how well they maintain their projects. 

5*) If Industry already loves it
Not a litmus test though strengthens community support and quality check.
Look for who all in Industry is already using it mainstream, also if you like the softwares they have developed. Just shoot a tweet/mail to them... people are mostly helpful. Don't give up on humanity. ;)

6*) Need to scan it personally anyhow
Try it in a sandbox first, monitor it's not spawning requests to domains it's not supposed to. Not creating any suspicious behavior you don't expect from it.
Also, it survives your production security lockdown, not all projects behave same under restrictions.

7*) Send it on a marathon
Put it under performance test yourselves. There might be preexisting load test results available, and might be accurate as well. But not all implementations suit all projects. Check it under PoC of your implementation behavior with expected concurrency and latency.

8*) Does it tailor fit
If it actually provides what you desire without putting a hack around, give it a chance. If not so, confirm that it suits the design and wouldn't break with project philosophy from maintainers over the coming recent versions at least.

9*) How easy is to resolve an issue
Is project community/developers active enough to help guide around any problems faced.

10*) Do you love supporting FOSS
If yes, welcome to the world of awesomeness. Some mediocrity (not below that, then look something else) at some of points above would only drive you strengthen the project. It's opensource, at least technologists are not supposed to live with the problem if faced.

Monday, February 3, 2014

golang ~ get local changes into GOPATH without pushing them upstream

To get your local Golang repo's sym-linked at your GOPATH and local changes available...

goenv_link(){
  if [ $# -ne 2 ]; then
    echo "Links up current dir to it's go-get location in GOPATH"
    echo "SYNTAX: goenv_linkme  "
    return 1
  fi
  _REPO_DIR=$1
  _REPO_URL=$2

  _TMP_PWD=$PWD
  cd $_REPO_DIR

  if [ -d "${GOPATH}/src/${_REPO_URL}" ]; then
    echo "$_REPO_URL already exists at GOPATH $GOPATH"
    go get "${_REPO_URL}"
    return 1
  fi
  _REPO_BASEDIR=$(dirname "${GOPATH}/src/${_REPO_URL}")
  if [ ! -d "${_REPO_BASEDIR}" ]; then
    mkdir -p "${_REPO_BASEDIR}/src"
  fi

  ln -sf "${PWD}" "${GOPATH}/src/${_REPO_URL}"
  go get "${_REPO_URL}"

  cd $_TMP_PWD
}

alias goenv_linkme="goenv_link $PWD"

---


Every now and then working on my favorite new programming language Golang, I have inter-dependent changes among different packages. To confirm their as-required working state, I'd like the GOPATH to provide the compiled object with local-changes included.

The utility I've been using to push local package changes to GOPATH provided object is following "goenv_alpha" bash function as a shell-profile provided utility.

Say, I've a golang project "github.com/abhishekkr/goshare" which utilizes "github.com/abhishekkr/goshare/httpd", "github.com/abhishekkr/goshare/zeromq" and few more.

If I make some local changes at "{PROJECTS}/goshare" and "{PROJECTS}/goshare/httpd". To push those into GOPATH provided package for testing, following commands using below function "goenv_alpha" shell-util would do the job...

$ goenv_alpha "{PROJECTS}/goshare" "github.com/abhishekkr/goshare"
$ goenv_alpha "{PR..}/goshare/httpd" "github.com/abhishekkr/goshare/httpd
"

These commands will ask you to make a backup file for current existing version of package resource from GOPATH, you can give any name... which will be asked while restoring or you can leave it empty to avoid creating a backup file.

~

goenv_alpha(){   _TMP_PWD=$PWD   if [ $# -ne 2 ]; then echo "Provide Alpha changes usable as any other go package."     echo "Just the import path changes to 'alpha/'"     echo "SYNTAX: goenv_alpha "     return 1   fi _REPO_DIR=$1   _REPO_URL=$2   cd $_REPO_DIR   _PKG_PARENT_NAME=$(dirname $PWD)   _PKG_NAME=$(basename $PWD)
  _PKG_NAME_IN_REPO=$(basename $_REPO_URL)   if [ $_PKG_NAME_IN_REPO != $_PKG_NAME ]; then echo "Path for creating alpha doesn't match the import 'url' for it."     return 1   fi   `go build -work . 2> /tmp/$_PKG_NAME`   _BUILD_PATH=`cat /tmp/$_PKG_NAME | sed 's/WORK=//'`   if [ ! -d $_BUILD_PATH ]; then echo "An error occured while building, it's recorded at /tmp/$_PKG_NAME"     return 1   fi rm -f /tmp/$_PKG_NAME   _CURRENT_OBJECT_PATH="${GOPATH}/pkg/${GOOS}_${GOARCH}"   _CURRENT_OBJECT="${_CURRENT_OBJECT_PATH}/${_REPO_URL}.a"   _NEW_OBJECT="${_BUILD_PATH}/_${_PKG_PARENT_NAME}/${_PKG_NAME}.a"   echo "Do you wanna backup current object? If yes enter a filename for it: "   read GO_ALPHA_BACKUP   if [ ! -z $GO_ALPHA_BACKUP ]; then mv $_CURRENT_OBJECT "${_CURRENT_OBJECT_PATH}/${_REPO_URL}/${GO_ALPHA_BACKUP}.backup"   fi mv $_NEW_OBJECT $_CURRENT_OBJECT   cd $_TMP_PWD   echo "\nAlpha changes have been updated at ${_CURRENT_OBJECT}." }

~

You can undo the pushing of local changes inclusive package resource if you have created a backup file for earlier existing file.

Following commands utilizes the below provided shell-util function "goenv_alpha_undo".

$ goenv_alpha_undo "{PROJECTS}/goshare" "github.com/abhishekkr/goshare"
$ goenv_alpha_undo "{PR..}/goshare/httpd" "github.com/abhishekkr/goshare/httpd"

This will list you the names of backup files present if any, then you can provide the name of your chosen backup file and restore to that package state.

~
goenv_alpha_undo(){
  _TMP_PWD=$PWD
  if [ $# -ne 2 ]; then
    echo "Provide Alpha changes usable as any other go package."
    echo "Just the import path changes to 'alpha/'"
    echo "SYNTAX: goenv_alpha  "
    return 1
  fi _REPO_DIR=$1
  _REPO_URL=$2   cd $_REPO_DIR   _PKG_PARENT_NAME=$(dirname $PWD)   _PKG_NAME=$(basename $PWD)   _PKG_NAME_IN_REPO=$(basename $_REPO_URL)   if [ $_PKG_NAME_IN_REPO != $_PKG_NAME ]; then echo "Path for creating alpha doesn't match the import 'url' for it."     return 1   fi _CURRENT_OBJECT_PATH="${GOPATH}/pkg/${GOOS}_${GOARCH}"   _CURRENT_OBJECT="${_CURRENT_OBJECT_PATH}/${_REPO_URL}.a"   _BACKUP_OBJECT="${_BUILD_PATH}/_${_PKG_PARENT_NAME}/${_PKG_NAME}.a"   echo "Available package files are:"   ls -1 $_CURRENT_OBJECT_PATH/$_REPO_URL | grep $_PKG_NAME | grep -v grep   echo "Enter your backup filename for it: "   read GO_ALPHA_BACKUP   if [ -z $GO_ALPHA_BACKUP ]; then echo "\nNo Backup file was entered." ; return 1   fi mv "${_CURRENT_OBJECT_PATH}/${_REPO_URL}/${GO_ALPHA_BACKUP}" $_CURRENT_OBJECT   cd $_TMP_PWD   echo "\nAlpha changes have been reverted with the provided backup file." }
~

The full [WIP] shell-profile for golang utilities is at:
https://github.com/abhishekkr/tux-svc-mux/blob/master/shell_profile/a.golang.sh

Thursday, December 5, 2013

go get pkg ~ easy made easier for project dependency management

For past sometime I've been trying out ways to improve practices upon awesome capabilities from GoLang. One of the things have been having a 'bundle install' (for ruby folks) or 'pip require -e' (for python folks)  style capability... something that just refers to an text file part of source code and plainly fetches all the dependencies path mentioned in there (for all others).

It and some other bits can be referred here...
https://github.com/abhishekkr/tux-svc-mux/blob/master/shell_profile/a.golang.sh#L31

It's a shell (bash) function that can be added to your Shell/System Profile files and used...

go_get_pkg(){   if [ $# -eq 0 ]; then if [ -f "$PWD/go-get-pkg.txt" ]; then PKG_LISTS="$PWD/go-get-pkg.txt"     else touch "$PWD/go-get-pkg.txt"       echo "Created GoLang Package empty list $PWD/go-get-pkg.txt"       echo "Start adding package paths as separate lines." && return 0     fi else PKG_LISTS=($@)   fi for pkg_list in $PKG_LISTS; do cat $pkg_list | while read pkg_path; do echo "fetching golag package: go get ${pkg_path}";         echo $pkg_path | xargs go get     done done }
---
What it do?
If ran without any parameters. It checks for current working directory for a file called 'go-get-pkg.txt'. If not found creates one empty file by that name. To be done at initialization of project. If found, then it iterates through each line and pass it directly to "get get ${line}". If ran with parameters. Each parameter is treated as path to files similar to 'go-get-pkg.txt' and similar action as explained previously is performed on each file.
Sample 'go-get-pkg.txt' file
-tags zmq_3_x github.com/alecthomas/gozmq github.com/abhishekkr/levigoNS github.com/abhishekkr/goshare
---

Friday, November 15, 2013

systemd enabled lightweight NameSpace Containers ~ QuickStart Guide

systemd (for some time now) provides a powerful chroot alternative to linux users for creating quick and lightweight system containers using power of cgroups and socket activation.

There is a lot more to "systemd" than this, but that's for some other post. Until then can explore it, starting here.

There is a utility "systemd-nspawn" provided by systemd which acts as container manager. This is what can be used to easily spawn a new linux container and manage it. It has been updated with (the systemd's amazing trademark feature) Socket Activation.

This enables any container to make parent/host's systemd instance to listen at different service ports for itself. Only when those service ports receive a connection, these container will spawn and act to it. Voila, resource utilization and scalability concepts.
More of this can read in detail at: http://0pointer.de/blog/projects/socket-activated-containers.html

Here we'll see some way to quickly start using it via some custom made commands.
All the script commands used here can referred from https://github.com/abhishekkr/tux-svc-mux/blob/master/shell_profile/a.virt.sh as well.

Just download and source the linked script in your shell, and the commands told here will be available...
And yes, your system also need to be running systemd already.

Currently this just lets you create archlinux containers, will soon create different containers and make the script mature.

In case you don't have any created container already, or wanna create a new one...
$ nspawn-arch
To list names of all created containers...
$ nspawn-ls
To stop a running container...
$ nspawn-stop
To start an already created conatiner
$ nspawn-start

---


---

Friday, July 26, 2013

Puppet ~ a beginners concept guide (Part 4) ~ Where is my Data

parts of the "Puppet ~ Beginner's Concept Guide" to read before ~
Part#1: intro to puppet, Part#2: intro to modules, and Part#3: modules much more.

Puppet
beginners concept guide (Part 4)


Where is my Data?

When I started my Puppet-ry, the examples I used to see had all configuration data buried inside the DSL code of manifests, people were trying to use inheritance to push down data. Then got to see a design pattern in puppet manifests keeping out separate parameters manifest for configuration variables. Then came along the External Data lookup via CSV files as a Puppet function. Then with enhancements in puppet and other modules came along more.

Below are few usable to fine ways utilizing separate data sources within your manifests,


Here, we will see usage styles of data for Puppet Manifests, Extlookup CSV, Hiera, Plug-in Facts and PuppetDB.

params-manifest:


It is the very basic way of separating out data from your functionality code, and the preferred way for in-future growing value-set type of data. It will keep it separate from the code since start. Once the requirement is at a level to have varied value to inferred based on environment/domain/fqdn/operatingsystem/[any-facter], it can be extracted to any preferred ways given below and just looked-up here. That would avoid changing the main (sub)module-code.
[ Gist-Set Externalize into Params Manifest: https://gist.github.com/3683955 ]
Say you are providing httpd::git sub-module for httpd module placing a template generated config file using params placed data...
```

File: httpd/manifests/git.pp
it includes the params submodule to access the data

File: httpd/templates/mynode.conf.erb

File: httpd/manifests/params.pp
it actually is just another submodule to only handle data

Use it: run_puppet.sh

```
_

extlookup-csv:


If you think your data would suit to a (key,value) CSV format being extracted to data files.Puppet need to be told the location for CSV files need to be looked up for key, and fetch the value assigned to it in that file.
Names given to these CSV files would matter to Puppet while looking up the values from all present CSV files. Puppet need to be given hierarchy order for these file-names to look for the key and the order could involve variable names.

For E.g. say you have a CSV by name of HOSTNAME, ENVIRONMENT and a common file, with hierarchy specified in respective order too. Then Puppet will first look for the queried Key in CSV by HOSTNAME, if not found looks up in ENVIRONMENT named file and after not finding it there goes looking into common file. If it doesn't find the key in any of those files, it returns the default value if specified in the 'extlookup(key, default_value)' method like this. If there is no default value also, Puppet will raise an exception for no value to return.

[ Gist-Set Externalize into Hierarchical CSV Files: https://gist.github.com/3684010 ]

It's the same example as for params with a flavor of ExtData. Here you'll notice a 'common.csv' external data file providing a default set of values. Then there is also a 'env_testnode.csv' file overriding the only required changed value. Now as in 'site.pp' file, precedence of 'env_%{environment}' file is higher than 'common', the 'httpd::git' would look-up all values first from 'env_testnode.csv' and if not found there would goto 'common.csv'. Hence would end-up overriding 'httpd_git_url' value from 'env_testnode.csv'.
```
```

extlookup() method used here is available as a Puppet Parser Function, you would read more in Part#5 Custom Puppet Plug-Ins on how to create your own functions
_


Hiera is a pluggable hierarchical data storage for Puppet. It was started to provide a better external data storage support than Ext-lookup feature with data formats other than CSV too.

This brings in the essence of ENC for data retrieval without having to write one.

Data look-up happens in a hierarchy provided by configuration with self scope resolution mechanism.

It enables Puppet to fetch data from varied external data sources using it's different backends (like local files, redis, http protocol) which can be added on to if needed.
The 'http' backend in turn enables support for data store from any service (couchdb, riak, web-app or so) to provide data.

File "hiera.yaml" from Gist below is an example of hiera configuration to be placed in puppet's configuration directory. The highlights of this configuration are ":backends:", backend source and ":hierarchy:". Multiple backend can be used at same time, their order of listing mark their order of look-up. Hierarchy configures the order for data look-up by scope.

Then depending on what backend you have added, you need to add their source/config to look-up data at.
Here we can see configuration for using local "yaml", "json" files. Look-up data from Redis server (it will set-up datasets for redis usage for current example) with authentication in place. And looking up data from any "http" service with hierarchy as the ":paths:" value.
You can even use GPG protected data as backend, but that is a bit messy to use.

Placing ".yaml" and ".json" from Gist at intended provider location.
The running "use_hiera.sh" would make you show the magic from this example on hiera.

```Gist
```
[Gist-Set Using Hiera with Masterless Puppet set-up: https://gist.github.com/abhishekkr/6133012 ]
_

plugin-facts:


Every system has its own set of information facter (
http://projects.puppetlabs.com/projects/facter) by default made available to puppet. Puppet also enable DevOps people to set custom facter to be used in modules.
The power of these computed Facters is they can use full ruby-power to use local/remote plain/encrypted data over REST/Database/API/anyway available channel.
These require the power of Puppet Custom Plug-Ins (http://docs.puppetlabs.com/guides/custom_facts.html). The ruby file doing this would go at 'MODULE/lib/puppet/facter' and would get loaded by the 'pluginsync=true' in action.
Way to set a Facter in such Ruby code is just...
my_value = 'all ruby code to compure it'
Facter.add(mykey) do
  setcode do
    my_value
  end
end
.....all the rest of code there need to compute the value to be set, or even key-set.

[Gist-Set Externalize Data receiving as Facter: https://gist.github.com/3684968 ]

Same 'httpd::git' example revamped to use Custom Facter as 
```
```
There is also another way to provide a Facter in Puppet Catalog, that can be done by providing an Environment variable with capitalized Facter name pre-fixed by 'FACTER_' and the value which it's supposed to have.
For E.g. # FACTER_MYKEY=$my_value puppet apply --modulepath=$MODULEPATH -e "include httpd::git"
_

puppetdb:


It's a beautiful addition to Puppet component set. Something that have been missing for long and possibly the thing because of which I delayed this post by half year.
It enables the 'storeconfig' power without the Master, provides a support of trusted DB for infrastructure-related data needs and thus best suited of all.

To set-up 'puppetdb' on a node follow the PuppetLabs has a nice documentation.
To set-up a decent example for master-less puppet mode, follow the given steps

Place the 2 '.conf' and 1 '.yaml' file in Puppet's configuration directory.
The shell script would prepare the node with PuppetDB service for masterless puppet usage scenario.

Puppet config setting storeconfig to 'puppetdb' enables saving of exported resources to it. The 'reports' config their would push the puppet apply reports to the database.
PuppetDB config makes Puppet aware of the host and port to connect database at.
The facts setting on routes.yaml enable PuppetDB to be used in a masterless mode.

```
```

[Gist-Set Using PuppetDB with Masterless Puppet set-up: https://gist.github.com/abhishekkr/6114760 ]

Now running anything say like...
puppet apply -e 'package{"vim": }'
and beautiful to that 'export resources' would work like a charm using PuppetDB.
The puppet.conf accompanied will make reports dumped to PuppetDB as well.

_


There's a fine article on the same by PuppetLabs...

Friday, May 31, 2013

Testing Chaos with Automated Configuration Management solutions


No noise making.

But let's be real, think of the count of community contributed (or mysterious closed-and-sold 3rd Party) services, frameworks, library and modules put to use for managing your ultra-cool self-healing self-reliant scalable Infrastructure requirements. Now with so many cogs collaborating in the infra-machine, a check on their collaboration seems rather mandatory like any other integration test for your in-house managed service. 
After all that was key idea behind having automated configuration management itself.

Now the utilities like Puppet/Chef have been out there accepted and used by dev & ops folks for quite some time now.
But the issue with the initially seen amateur testing styles is it evolved from the non-matching frame of 'Product' oriented unit/integration/performance testing. 'Product' oriented testing focus more on what happens inside the coded logic and less on how user gets affected by product.
Most of the initial tools released for testing logic developed in Chef/Puppet were RSpec/Cucumber inspired Product testing pieces. Now for the major part of installing a package, restarting a service or pushing artifacts these tests are almost non-required as the main functionality for per-say installing package_abc is already tested inside the framework being used.
So coding to "ask" to install package_abc and testing if it has been asked seems futile.

That's the shift. The logic developed for Infrastructure acts as a glue to all other applications created in house and 3rd party. Here in Infrastructure feature development there is more to test for the effect it has on the it's users (software/hardware) and less on internal changes (dependencies and dynamic content). Now the stuff in parentheses here means a lot more than seems... let's get into detail of it.

Real usability of Testing is based on keeping sanctity of WHAT needs to be tested WHERE.


Software/Hardware services that collaborate with the help of Automated Infrastructure logic needs major focus of testing. These services can be varying from the
  • in-house 'Product', that is the central component you are developing
  • 3rd Party services it collaborates with,
  • external services it utilizes for what it doesn't host,
  • operating system that it supports and Ops-knows what not.

Internal changes mainly revolve around
  • Resources/Dependencies getting called in right order and grouped for specific state.
  • It also relates to correct generation/purging of dynamic content, that content can itself range as
    • non-corrupt configuration files generated of a template
    • format of sent configuration data from one Infra-component to another for reflected changes
    • dynamically creating/destroying service instances in case of auto-scalable infrastructure


One can decide HOW, on ease and efficiency basis.


Unit Tests work for the major portion of 'Internal Changes' mentioned before using chefspecrspec-chef, rspec-puppet like libraries are good enough. They can very well test the dependency order and grouping management as well as the different data effect on non-corrupt configuration generation from templates.


Integration Tests in this perspective are a of a bit interesting and evolutionary nature. Here we have to ensure the "glue" functionality we talked about for Software/Hardware service is working properly. These will confirm that every type of required machine role/state can be achieved flawlessly, call them 'State Generation Test'. They also need to confirm the 'Reflected Changes Test' across Infra-component as mentioned in Internal changes.
Now utilities like test-kitchen/docker in collaboration with vagrant, docker, etc. help placing them in your Continuous Integration pipeline. This would even help in testing same service across multiple linux distros if that's the plan to support.
Library 'ServerSpec' is also a little nifty piece to write quick final state check scripts.
Then final set of Integration Testing is implemented in form of Monitoring on your all managed/affecting Infrastructure components. This is the final and ever-running Integration Test.


Performance Tests, yes even they are required for it. Tools like ChaosMonkey enable you to enable your Infra to be self-healing and auto-scalable. Should be load-test noticing dynamic containers count and behavior if auto-scalability is a desired functionality too.