Tuesday, October 23, 2012

federated backpack demystified

What is a federation?
Off the top of my head definition of a federation would be; "A federation is a collection of different and dispersed things that have agreed to (or have been forced to) work together, while also remaining independent within themselves". I also like the definition given from a google search on "federation".
fed·er·a·tion /fedəˈrāSHən/
Noun:
1. A group of states with a central government but independence in internal affairs.
2. An organization or group within which smaller divisions have some degree of internal autonomy.
Ok, so a federation is how things collaborate / work together for a greater good and how a number of things bubble up into a larger collective. Two terms that are have been used within the technology realm are federated search and federated databases. Both these fit well within building an understanding of the federated backpack.

Federated search is a favorite of mine due to a past project where we created federated legal search across eight different content sources. The idea being that when you do a legal search the results should come from different content sources, providing different views or legal positions or sets of information. You really want to get a holistic view when doing legal research and how better than looking into the consolidation of a number of different sources, each which reference a different set on content regarding the same search term. The challenge comes in because, most often, the different content sources don't organize information in the same way. They use different taxonomies, or identify different information elements as important, or the underlying data is stored in different ways. The heavy lifting of federated search is in reconciling these different data (or information) structures so that they can bubble up into a single, well structured, search result. Cool thing was we used open source for the solution. There was a bunch of additions we needed to make, the search engine we used was Lucene SOLR. All good, and the nice thing was it has great features to reconcile the different data structures.

Centralized vs Federated Databases
The federated databases are similar to federated search in that they bring together data from different sources, reconcile the differences and display query results as a single, well structured, result. The accompanying diagram compares centralized database (left-side) with a federated database (right-side). Centralization means that all the data is stored in one database within one model (or a single data structure) for all the data in the database. The federated database transparently maps all the different databases into one single database structure. The mapping will accommodate for any differences in data structure among the different databases. This mapping approach allows autonomy between the databases so they can alter their data structure to better suit their particular needs. Again, the heavy lifting comes in reconciling the data's structural differences and having the technology to provide a single, well structured query result.

Click here is you want a more detailed description of Centralized vs Federated databases.

So there you have it, two good examples of federations within the technology realm. In summary, a federation is the grouping of autonomous sets of data with common elements. When viewing into the federated data it looks to be one set of data where the processes and tools surrounding the federation reconcile the differences of the data sets.

badge is saved in OBI database.
What is the backpack? really.
Important Point 1: Currently the Mozilla digital backpack is a view into your badge data as stored in the Open Badges Infrastructure (OBI). The OBI currently runs on the Mozilla servers that are hosting http://beta.openbadges.org. So when you "store" your badges in your backpack, you are actually storing your badges in the database of the OBI and the backpack is providing the view into that data. Some of the data structures within the OBI are dedicated to the backpack, currently all the data is stored within the OBI database. When it comes to federation the storage location is an important distinction.
Important Point 2: The OBI is open source and can be hosted on pretty much any server that supports the technical requirements of the OBI. This means that many instances of the OBI can exist. Therefore, many backpacks can also exist. This is where the concept of federated backpacks begins.

As the Mozilla Open Badges Infrastructure (OBI) matures and proliferates it will mean badges will be stored in different physical servers and these servers can be distributed through-out the world. People will be storing their badges in these different servers, and be making reference (maybe) to these servers for display of their badges. The idea of the federated backpack allows a consolidated view of all the badges a person has earned, regardless of where the badges are stored.

Why is the federated backpack the holy grail of open badges?
Back in May of 2012 Chris McAvoy (Product Lead for Mozilla Open Badges Project) wrote a post that said that the federated backpack was the holy grail of the open badges infrastructure. So what is meant by the federated backpack being the holy grail of the open badges project. I'd mostly say cause it is the milestone where they could consider themselves finished (well not really). I believe the day when Mozilla could sever its responsibility from hosting an OBI instance is the beginning of when the OBI could truly be released into the public domain.This is also the day when other instances of the OBI are up and running and all these instances openly exchange information about the badges they contain. They become a federation of Open Badges Infrastructure (OBI) instances; or in other words, the federated backpack. This would also mean that Mozilla would no longer have to host an OBI instance and could focus more deeply on making the OBI code base rock solid and continue advocating for an open metadata standard for the digital badge.

Federated Databases (with a digital badges example)

The concept of federation is important when entering into discussions about distributed technology. I like the definition given from a google search on "federation".
fed·er·a·tion /fedəˈrāSHən/
Noun:
1. A group of states with a central government but independence in internal affairs.
2. An organization or group within which smaller divisions have some degree of internal autonomy.
So a federation is how things collaborate / work together for a greater good and how a number of things bubble up into a larger collective. So how does this apply to the idea of federated databases. Federated databases provide technologies to make a collection of databases look like a single database. This is a simplified description that will do for the scope of this post. A good way to describe database federation is to compare it to a centralized database.
Centralized vs Federated databases
 The above diagram puts images of the centralized and federated database next to each other. The left side of the image represents the centralized database where the right side is the federated database. Databases store data and with the centralized model all the data is stored in the single centralized database. With the federated model the data is stored in a collection of related and autonomous databases. Each database within the federated approach may not have the exact same structure. This is both a blessing and a curse. It provides the ability for each database to have a different structure to meet their specific information needs, but it also makes it difficult to merge all the databases into a single common structure. All the databases within a federation have similar elements which provide the ability to link (or map) all the data together. Therefore, providing a single view of data; a federation of data.

Note 1: the amount of "centralized" data stored to link the federated databases together can vary. In some situations there is minimal amount of centralized data storage and all the databases are linked via mix of technology, good design and well managed metadata. The range of differences in how federated databases are implemented is well described in this technical paper on "Federated Database Systems".

Example: a federation of digital badges
Currently there are a number of emerging digital badge systems. Each of these different systems is designed to serve their particular badge issuer needs. Each has both similar and different attributes for what is a digital badge. A basic comparison of their similarities and differences would assist in describing a federation of digital badges. The following five sites issue digital badges, and each of them store user data regarding their earned badges in a database hosted on their respective servers;
The following differences and similarities come from a shallow analysis of these different badge systems. They serve as an example for this discussion on federated databases.

Data Differences:
  • badge criteria - the implementation of badge criteria can vary. It varies from a simple url location (mozilla), a collection of other badges or accomplishments (khan), and a dynamic criteria based on live contribution data (stackoverflow).
  • badge evidence - can also vary in its implementation, some of the evidence will follow the format required / specified by the criteria. While other evidence formats include a variety of different media and online contributions.
With these differences each badge system cannot be implemented in exactly the same database because they each use different data types and / or data structures. To overcome these differences when building the federation the data fields need to be mapped to a common data structure so they can be viewed as a single common set of data fields. This information is transformed into a common structure for the data federation. When this mapping occurs something is usually lost due to a systems specific data structure being mapped to a common data structure.

Data Similarities: 
  • badge name - the title of the badge
  • badge description - the description of the badge
  • issuer url - the internet url of the badge issuer
  • badge image - the url of the image used
  • earner identifier - the unique identifier of the badge earner
  • earner email - the earner email address
All the badge systems have these common (or similar) attributes. They are stored in each badge issuers database under different field names, but the data structures are similar enough that they could be merged together into one database without needing to transform the data.

To create a federated database of these different badge systems all the data would be merged. The similar data fields could be easily merged for the format is the same and would require no transformation. The different data fields could also be merged with some transformations, though it is likely some detail would be lost having the differences conform to the similar structure. If all goes well from a merge and data transformation perspective you would end up with a single view into all the different badge systems.

Note 2: Federated systems can have different amounts of merge and transform. In some situation the data is copied and moved into a new database that contains the federated data. In other situations technology sits on top of all the different databases and the merge and transform occurs in real-time and no data is moved or copied.

Note 3: This is a simplified discussion of federated databases. There are many design and technical details that have been simplified for this discussion. Feel free to email me if you would like to engage in a deeper discussion about database federation.

    Wednesday, October 10, 2012

    credentials, population and education

    So after some research, some thought and a weeks reflection I believe I have come up with the data set I need to begin my discussion about where, I believe, the focus of badges should be. The data I require comes from the follow three sources;
    1. tertiary education levels - I want to know the education levels being achieved / persued throughout the world.
    2. global population - I want to determine the approximate number of people being educated beyond a grade 10 level.
    3. credentialing - how saturated with credentialing the different segments from the above two data sources have become.
    I believe that the current set of digital badges projects (open or otherwise) are focused on populations that already have adequate credentialing systems. And for digital badging to be successful they may want to focus on the areas where little or no credentialing currently exists, yet also have access to the internet. More on this to come... stay tuned.

    Monday, October 01, 2012

    Linkages to other badges

    Over the last few weeks I have been diving into other badge systems to figure out a way to consolidate badges from other existing (non-obi compliant) systems into the Mozilla Open Badges Backpack. This has really forced me to think about the metadata that makes up the Open Badge. Some of the other badge systems I have looked at include;
    What I find interesting is that it is not immediately clear how all these badge systems will map to the current OBI specification for a badge. And until we have a standard metadata specification for the badge, we don't really have an "open" badge. I have great faith that the open community, Mozilla and the open badges team will work toward an open metadata standard for all (or should that be ALL) badges. I also believe that their work at implementing webmaker badges will improve the current open badge metadata standard greatly.
    WebMaker Badges show a system design where badges are linked, clustered and the detail of a cumulative badge.
    IMHO, a number of attributes regarding a badge need to be added or brought into the discussion about badges. Fortunately, I am not the first to think about this. A page within the open badge github was put in place six months ago to foster such a discussion (it remains empty, but it is there). Three of the attributes I can see originating from the new webmaker "ecosystem" of badges are what I will name; linked, clustered and the detail of a cumulative badge. A little further below in this post is how I see these three, and a couple of my additions, need to be considered within the metadata of the badge. But first, I think it important to consider who is assigning this relationship.

    Who creates a badges relationships?
    I believe two different people (or groups) create the badge relationships; the issuer and the earner.
    • The issuer creates a badge and its metadata detailing the badges criteria, origin, contact, etc. The issuer also knows how the badge exists in relationship to other badges. Does the badge stand alone as representing a skill or knowledge all on its own? or does the badge exist with other badges to be a part of a curriculum, learning objectives, a learning journey or a cluster of skills and knowledge?
    • The earner, who has little control over a badges metadata, will also create relationships between badges. Is the earner creating their own learning journey or clustering badges to represent a unique set of skills and knowledge. I believe one of the strengths of badges is how the earner is given greater control of their learning by allowing them to create a personalized curriculum.The earner needs to be able to organize their own badges. Fortunately this can be done in the Mozilla badges backpack.
    This distinction is important when considering the required additions to the badge metadata specification. Once a badge is issued its attributes cannot be altered for (given current thinking) this would invalidate the badge. I agree with this assertion. What this means is that if an earner wants to set relationships among their earned badges this needs to be done in the earners badge backpack and not alter the metadata of the badge itself. I believe the issuer has many reasons to set the relationships within the metadata of the badge, as the issuer understands how the badges fits within learning and related learning. Look to the above webmaker badges for an example of this.

    Badge relationships:
    Currently I see four different kinds of badge relationships. All of these apply to the issuer where only a subset apply to the earner. Please do not confuse a badge with curriculum or assessment, they represent mastery of a skill or knowledge.
    Some possible additional json attributes
    1. journey (or linked) - a set of badges that represent a learning journey. They are related through subject matter or knowledge domain. Badges are issued and some build upon each other, others are gates (you need this badge or set of badges to move on to earning other badges). They may be the previous or next badge in a learning journey.
    2. clustered (or grouped) - badges can be clustered to represent a domain of knowledge or the complete mastery of a skill.
    3. master-detail - a set of badges may be sub-badges to a master badge. The idea being someone needs to collect a set of badges to be awarded the overall competency badge. (the webmaker badges seem this way). This badge relationship differs from clustered in there is no higher level badge or overall competency badge with the clustered relationship.
    4. dynamic - some badges change or are updated through time. This is very apparent with engagement type badges (see stackoverflow) where the more someone contributes the greater a badge is given value for the specific earner.
    Next steps?
    I believe the Mozilla badge needs to incorporate these (or similar) additional attributes into the badges metadata specification. Not all may be required as a part of the specification as having them baked into the badge may not be the best place for them. But I believe these relationships need to be considered in the design of the badge, its metadata, the role of the backpack and its features, and what features should be part of the displayer.

    To assist with this endeavour I will continue in mapping existing non-OBI compliant badge systems to determine the gap between emerging badge systems and what the current Mozilla Open Badge can support. I will continue my journey toward building a badge consolidation system with an alpha release at some unannounced date in the future (I've got a family to support so paid work will be my priority, duhhh).