Concept Computing

What is a thesaurus?

A compendium of synonyms and related terms.

Thesaurus lists words in groups of synonyms and related concepts.

Thesaurus lists words in groups of synonyms and related concepts.

A thesaurus organizes terms based on concepts and relationships between them. Relationships commonly expressed in a thesaurus include hierarchy, equivalence, and associative (or related). These relationships are generally represented by the notation BT (broader term), NT (narrower term), SY (synonym), and RT (associative or related). Associative relationships may be more granular in some schemes.

For example, the Unified Medical Language System (UMLS) from the National Library of Medicine has defined over 40 relationships across more than 80 vocabularies, many of which are associative in nature. Preferred terms for indexing and retrieval are identified. Entry terms (or non-preferred terms) point to the preferred terms that are to be used for each concept.

What is taxonomy?

A hierarchical or associative ordering of terms.

Examples of types of taxonomy

Examples of types of taxonomy

A taxonomy is a hierarchical or associative ordering of terms representing categories. A taxonomy takes the form of a tree or a graph in the mathematical sense. A taxonomy typically has minimal nodes, representing lowest or most specific categories in which no sub-categories are included as well as a top-most or maximal node or lattice, representing the maximum or general category.

What are folk taxonomies?

A category hierarchy with 5-6 levels that has its most cognitively basic categories in the middle.

Source: George Lakoff

Source: George Lakoff

In folk taxonomies, categories are not merely organized in a hierarchy from the most general to the most specific, but are also organized so that the categories that are most cognitively basic are “in the middle” of a general-to-specific hierarchy. Generalization proceeds upward from the basic level and specialization proceeds down.

A basic level category is somewhere in the middle of a hierarchy and is cognitively basic. It is the level that is learned earliest. Usually has a short name and is used frequently. It is the highest level at which a single mental image can reflect the category. Also, there is no definitive basic level for a hierarchy – it is dependent on the audience. Most of our knowledge is organized around basic level categories.

What is the Watson Ecosystem?

IBM launches cognitive computing cloud platform.

Cognitive computing is going mainstream

IBM is taking Watson and cognitive computing into the mainstream

The Watson Ecosystem empowers development of “Powered by IBM Watson” applications. Partners are building a community of organizations who share a vision for shaping the future of their industry through the power of cognitive computing. IBM’s cognitive computing cloud platform will help drive innovation and creative solutions to some of life’s most challenging problems. The ecosystem combines business partners’ experience, offerings, domain knowledge and presence with IBM’s technology, tools, brand, and marketing.

IBM offers a single source for developers to conceive and produce their Powered by Watson applications:

  • Watson Developer Cloud — will offer the technology, tools and APIs to ISVs for self-service training, development, and testing of their cognitive application. The Developer Cloud is expected to help jump-start and accelerate creation of Powered by IBM Watson applications.
  • Content Store — will bring together unique and varying sources of data, including general knowledge, industry specific content, and subject matter expertise to inform, educate, and help create an actionable experience for the user. The store is intended to be a clearinghouse of information presenting a unique opportunity for content providers to engage a new channel and bring their data to life in a whole new way.
  • Network — Staffing and talent organizations with access to in-demand skills like linguistics, natural language processing, machine learning, user experience design, and analytics will help bridge any skill gaps to facilitate the delivery of cognitive applications. .These talent hubs and their respective agents are expected to work directly with members of the Ecosystem on a fee or project basis.

How does cognitive computing differ from earlier artificial intelligence (AI)?

Cognitive computing systems learn and interact naturally with people to extend what either humans or machine could do on their own. In traditional AI, humans are not part of the equation. In cognitive computing, humans and machines work together. Rather than being programmed to anticipate every possible answer or action needed to perform a function or set of tasks, cognitive computing systems are trained using artificial intelligence (AI) and machine learning algorithms to sense, predict, infer and, in some ways, think.

Cognitive computing systems get better over time as they build knowledge and learn a domain – its language and terminology, its processes and its preferred methods of interacting. Unlike expert systems of the past which required rules to be hard coded into a system by a human expert, cognitive computers can process natural language and unstructured data and learn by experience, much in the same way humans do. While they’ll have deep domain expertise, instead of replacing human experts, cognitive computers will act as a decision support system and help them make better decisions based on the best available data, whether in healthcare, finance or customer service.

Smart solutions demand strong design think

IBM unveils new Design Studio to transform the way we interact with software and emerging technologies

IBM unveils new Design Studio to transform the way we interact with software and emerging technologies

The era of smart systems and cognitive computing is upon us. IBM’s product design studio in Austin, Texas will focus on how a new era of software will be designed, developed and consumed by organizations around the globe.

In addition to actively educating existing IBM team leads from engineering, design, and product management on new approaches to design, IBM is recruiting design experts and is engaging with leading design schools across the country to bring designers on board, including the d.school: Institute of Design at Stanford University, Rhode Island School of Design, Carnegie Mellon University, North Carolina State University, and Savannah College of Art & Design. Leading skill sets at the IBM Design Studio include Visual Design, Graphic artists, User Experience Designers, Design Developers, including Mobile developers, and Industrial designers.

It’s not magic

Don Estes

Don Estes

Don Estes is an IT management and technical consultant with special expertise in large scale legacy modernization projects.

An automated modernization project, also referred to as a “conversion”,  “migration”, “legacy revitalization” or “legacy renewal” project, is inherently different from most projects in which  IT professionals will participate during their careers, and in several different ways.  When this specialized type of project goes awry, it is almost always from a failure to appreciate these differences and to allow for them in the project plan.

Properly controlled, an automated modernization project should be the least risky of any major project, but a failure to implement the proper controls can make it a very risky project indeed.  Automated modernization projects obtain their substantial cost savings and short delivery schedules by extracting highly leveraged results from the automation.  However, it is easy to forget that a lever has two arms, and – improperly implemented – you can find leverage working against you rather than for you in your project.

When there is residual value in a legacy application, an automated modernization project can extract and use that value in a highly cost/effective manner. Of course, in some cases this is futile, but in many if not most projects it has significant technical and financial merit. There are 3 important technical strategies:

  1. When the business rules expressed in a legacy system still fit the business process, but have a problem with software infrastructure (e.g., database, “green screen” interface, language, hardware platform, etc.), there is usually a fast, cheap and low risk way to deal with the problem, applying technology to renovate the code base into supporting the target configuration.
  2. When legacy systems partially fit the current business process but need significant functional expansion or modification, a re-engineering approach may make more sense. This way the original system is reproduced identically in totally new technology, then re-factored according to agile principles to meet the new requirements. Though counterintuitive to some, this approach is faster, cheaper and lower risk than taking a blank sheet of paper and starting over – because at every point in the project you have a fully functional system.
  3. When maintenance costs are high in a legacy application, it is possible to logically restructure the application to reduce the effort of maintenance programming. This is usually very cost/effective. Depending on how bad the code is, maintenance cost reductions of as much as 40% are possible, though this approach has the best results for the worst systems.

Anyone considering a modernization in isolation, and particularly anyone considering a modernization versus a replacement, should carefully weigh the risks. In the projects we have seen, the success rate is very high even for large projects, far more than the replacement approach. It is our firm conviction that if the issues discussed in this essay are adequately taken into account in modernization projects, the success rate will be 100%.

For more information, see Don’s essay on automated modernization: It’s Not Magic

Governance, Risk and Compliance

Playing Jazz in the GRC Club

Playing Jazz in the GRC Club

John Coyne is a preeminent innovator in technology for financial services. He holds patents in transactional AI, object-oriented, and semantic based systems. As a global lead for Governance, Risk and Compliance (GRC), John architects innovative transformations of financial services businesses.

Some problems have such importance to business, are so complex and burdensome, that, if you can solve them, even in part, huge benefits can result. This is the case with regulation. It consumes $ multi-trillions in cost and labor to comply. Regulation is growing faster than the economy. For large companies, this nets out to hundreds of millions of dollars of non-value added expense yearly. What if it were possible to reduce the burden and cost of regulation by 50-90 percent?

Playing Jazz in the GRC Club

In this book John Coyne and Thei Geurts describe the underlying principles, actionable framework, and solution patterns for shrinking compliance costs and burden. They outline Semantic GRC approaches have the potential to turn governance, risk and compliance from a costly cul-de-sac into a proactive and profit enhancing business outcome.

Why is visualization important?

Patterns provide a 60% faster way to locate, navigate, and grasp meanings.

Examples of information visualization. Source: VisualComplexity

Examples of information visualization. Source: VisualComplexity

Information visualization technologies can enable most users to locate specific information they are looking for as much as 60 percent faster than with standard navigation methods.

Visualization techniques exploit multiple dimensions, e.g.:

  • 1D — Links, keywords lists, audio.
  • 2D — Taxonomies, facets, thesauri, trees, tables, charts, maps, diagrams, graphs, schematics typography, image
  • 2.5D — Layers, overlays, builds, multi-spaces, 2D animation, 2D navigation in time
  • 3D/4D — 3-dimensional models, characters, scenes, 3D animation, virtual worlds, synthetic worlds, and reality browsing.

What is visual language?

Words, images and shapes, tightly integrated into communication units.

Source: Robert Horn

Source: Robert Horn

Visual language is the tight integration of words, images, and shapes to produce a unified communication. It is a tool for creative problem solving, problem analysis, and a way of conveying ideas and communicating about the complexities of our technology and social institutions.

Visual language can be displayed on different media and different size communication units. Visual language is being created by the merger of vocabularies from many different fields as shown in the diagram above, from Robert Horn.

As the world increases in complexity, as the speed at which we need to solve business and social problems increases, as it becomes increasingly critical to have the “big picture” as well as multiple levels of detail immediately accessible, visual language will become more and more prevalent in our lives.

What’s coming next are semantic, knowledge-enabled tools for visual language. Computers will cease being mere electronic pencils, and be used to author, manage, and generate shared executable knowledge by means of patterns expressed through visual language.

Semantic Verses

Semantic capabilities for desktop and mobile apps.

Semantic capabilities for desktop and mobile apps.

I had an interesting conversation with Dr. Walid Saba about semantic search, enrichment,  summarization, and recommendation capabilities that he and his colleagues have been developing at Magnet. As he describes it, the basic issues they are addressing can be described this way:

Why is the retrieval of semantically/topically relevant information difficult?

Two reasons:

  • Bad (semantic) precision — A document might mention a phrase several times, although the document is not at all about that topic.
  • Bad (semantic) recall — A document might never mention a phrase explicitly, but it is essentially about a semantically/topically related topic.

The essential problem is how do we determine what a certain document is (semantically, or topically) about, regardless of the actual words being used. To do this, we must go from words to meanings.

First, we must perform word-sense disambiguation with very high accuracy. This involves recognizing named entities.

Second, using some concept algebra, we must make topics (compound meanings) from simpler ones, for example: “Android phones” “Apple devices”,“Amazon.com’s web site”, “text message”

Third, we must go from topics to key topics. We understand what a document is essentially about when we can determine the set of key topics. Semantic Verses does this by identifying a potentially infinite set of topics, as opposed to a pre-engineered ontology of a finite set of topics. This enables semantically comparing topics written using completely different sets of words across languages and across media.

As highlighted in the figure above, Dr. Saba and his team are developing a number tools for individuals and businesses to tap these semantic capabilities. This includes plug-ins for browsers; plug-ins for MS Word and Powerpoint; and tools for blogs as well as an API.

The underlying semantic (concept computing) engine runs quickly on a single node e.g.: 10 queries  per second on a database of 10 million documents.  No training required to get started.  On a single node it can process 50 documents per second. Its dynamic index has a small footprint. Adding to it is real-time and does not require re-clustering or re-indexing.

Dr. Walid Saba

Dr. Walid Saba

Seven examples of the business value of ontologies

Each year at the National Institutes of Standards and Technology, the Ontolog Forum brings together a community of  researchers, educators, and practitioners to discuss the role of ontologies in next generation solutions.  This presentation highlights seven case examples showing how ontologies deliver business value.

 

The goal of language understanding

JohnSowa_20070712

John F. Sowa

John Sowa is an American computer scientist, an expert in artificial intelligence and computer design, and the inventor of conceptual graphs. Over the past several years he has been developing a series of slides to overview key problems and challenges relating to the current state of language understanding by computer. You can download The Goal of Language Understanding  (November 15, 2013 version) here. The following topics are from the summary.

1. Problems and Challenges

Early hopes for artificial intelligence have not been realized. The task of understanding language as well as people do has proved to be far more difficult than anyone had thought. Research in all areas of cognitive science has uncovered more complexities in language than current theories can explain.A three-year-old child is better able to understand and generate language than any current computer system.

Questions:

  • Have we been using the right theories, tools, and techniques?
  • Why haven’t these tools worked as well as we had hoped?
  • What other methods might be more promising?
  • What can research in neuroscience and psycholinguistics tell us?
  • Can it suggest better ways of designing intelligent systems?

2.  Psycholinguistics and Neuroscience

Brain areas involved in language processing

Brain areas involved in language processing

Language is a late development in evolutionary time. Systems of perception and action were highly developed long before some early hominin began to talk. People and higher mammals use the mechanisms of perception and action as the basis for mental models and reasoning. Language understanding and generation use those mechanisms.

Logic and mathematics are based on abstractions from language that use the same systems of perception and action. Language can express logic, but it does not depend on logic. Language is situated, embodied, distributed, and dynamic.

3. Semantics of Natural Languages

Human language is based on the way people think about everything they see, hear, feel, and do. And thinking is intimately integrated with perception and action. The semantics and pragmatics of a language are:

  • Situated in time and space,
  • Distributed in the brains of every speaker of the language,
  • Dynamically generated and interpreted in terms of a constantly developing and changing context,
  • Embodied and supported by the sensory and motor organs.

These points summarize current views by psycholinguists. Philosophers and logicians have debated other issues: e.g., NL as a formal logic; a sharp dichotomy between NL and logic; a continuum between NL and logic.

4. Ludwig Wittgenstein

Considered one of the greatest philosophers of the 20th century. Wrote his first book under the influence of Frege and Russell. That book had an enormous influence on analytic philosophy, formal ontology, and formal semantics of natural languages.

But Wittgenstein retired from philosophy to teach elementary school in an Austrian mountain village. In 1929, Russell and others persuaded him to return to Cambridge University, where he taught philosophy. During the 1930s, he began to rethink and criticize the foundations of his earlier book, including many ideas he had adopted from Frege and Russell.

5. Dynamics of Language and Reasoning

Natural languages adapt to the ever-changing phenomena of the world, the progress in science, and the social interactions of life.No computer system is as flexible as a human being in learning and responding to the dynamic aspects of language.

Three strategies for natural language processing (NLP):

  1. Neat: Define formal grammars with model-theoretic semantics that treat NL as a version of logic. Wittgenstein pioneered this strategy in his first book and became the sharpest critic of its limitations.
  2. Scruffy: Use heuristics to implement practical applications. Schank was the strongest proponent of this approach in the 1970s and ’80s.
  3. Mixed: Develop a framework that can use a mixture of neat and scruffy methods for specific applications.

NLP requires a dynamic foundation that can efficiently relate and integrate a wide range of neat, scruffy, and mixed methods.

6. Analogy and Case-Based Reasoning

Induction, Abduction, Deduction, and Action

Induction, Abduction, Deduction, and Action

Based on the same kind of pattern matching as perception:

  • Associative retrieval by matching patterns.
  • Approximate pattern matching for analogies and metaphors.
  • Precise pattern matching for logic and mathematics.

Analogies can support informal, case-based reasoning:

  • Long-term memory can store large numbers of previous experiences.
  • Any new case can be matched to similar cases in long-term memory.
  • Close matches are ranked by a measure of semantic distance.

Formal reasoning is based on a disciplined use of analogy:

  • Induction: Generalize multiple cases to create rules or axioms.
  • Deduction: Match (unify) a new case with part of some rule or axiom.
  • Abduction: Form a hypothesis based on aspects of similar cases.

7. Learning by Reading

Perfect understanding of natural language is an elusive goal:

  • Even native speakers don’t understand every text in their language.
  • Without human bodies and feelings, computer models will always be imperfect approximations to human thought.

For technical subjects, computer models can be quite good:

  • Subjects that are already formalized, such as mathematics and computer programs, are ideal for computer sytems.
  • Physics is harder, because the applications require visualization.
  • Poetry and jokes are the hardest to understand.

But NLP systems can learn background knowledge by reading:

  • Start with a small, underspecified ontology of the subject.
  • Use some lexical semantics, especially for the verbs.
  • Read texts to improve the ontology and the lexical semantics.
  • The primary role for human tutors is to detect and correct errors.

The Process of Language Understanding

People relate patterns in language to patterns in mental models. Simulating exactly what people do is impossible today:

  • Nobody knows the details of how the brain works.
  • Even with a good theory of the brain, the total amount of detail would overwhelm the fastest supercomputers.
  • A faithful simulation would also require a detailed model of the body with all its mechanisms of perception, feelings, and action.

But efficient approximations to human patterns are possible:

  • Graphs can specify good approximations to continuous models.
  • They can serve as the logical notation for a dynamic model theory.
  • And they can support a high-speed associative memory.

This engineering approach is influenced by, but is not identical to the cognitive organization and processing in the human brain.