Knowledge Science

What is pattern recognition?

Techniques that distinguish signal from noise.

Sources: Gary Larson, Barbara Catania and Anna Maddalena

Sources: Gary Larson, Barbara Catania and Anna Maddalena

Pattern recognition techniques distinguish signal from noise through statistical analyses, Bayesian analysis, classification, cluster analysis, and analysis of texture and edges. Pattern recognition techniques apply to sensors, data, imagery, sound, speech, language.

Automated classification tools distinguish, characterize and categorize data based on a set of observed features. For example, one might determine whether a particular mushroom is “poisonous” or “edible” based on its color, size, and gill size. Classifiers can be trained automatically from a set of examples through supervised learning. Classification rules discriminate between different contents of a document or partitions of a database based on various attributes within the repository.

Statistical learning techniques construct quantitative models of an entity based on surface features drawn from a large corpus of examples. In the domain of natural language, for example, statistics of language usage (e.g., word trigram frequencies) are compiled from large collections of input documents and are used to categorize or make predictions about new text.

Statistical techniques can have high precision within a domain at the cost of generality across domains. Systems trained through statistical learning do not require human-engineered domain
modeling. However, they require access to large corpora of examples and a retraining step for each new domain of interest.

What is a thesaurus?

A compendium of synonyms and related terms.

Thesaurus lists words in groups of synonyms and related concepts.

Thesaurus lists words in groups of synonyms and related concepts.

A thesaurus organizes terms based on concepts and relationships between them. Relationships commonly expressed in a thesaurus include hierarchy, equivalence, and associative (or related). These relationships are generally represented by the notation BT (broader term), NT (narrower term), SY (synonym), and RT (associative or related). Associative relationships may be more granular in some schemes.

For example, the Unified Medical Language System (UMLS) from the National Library of Medicine has defined over 40 relationships across more than 80 vocabularies, many of which are associative in nature. Preferred terms for indexing and retrieval are identified. Entry terms (or non-preferred terms) point to the preferred terms that are to be used for each concept.

What is taxonomy?

A hierarchical or associative ordering of terms.

Examples of types of taxonomy

Examples of types of taxonomy

A taxonomy is a hierarchical or associative ordering of terms representing categories. A taxonomy takes the form of a tree or a graph in the mathematical sense. A taxonomy typically has minimal nodes, representing lowest or most specific categories in which no sub-categories are included as well as a top-most or maximal node or lattice, representing the maximum or general category.

What are folk taxonomies?

A category hierarchy with 5-6 levels that has its most cognitively basic categories in the middle.

Source: George Lakoff

Source: George Lakoff

In folk taxonomies, categories are not merely organized in a hierarchy from the most general to the most specific, but are also organized so that the categories that are most cognitively basic are “in the middle” of a general-to-specific hierarchy. Generalization proceeds upward from the basic level and specialization proceeds down.

A basic level category is somewhere in the middle of a hierarchy and is cognitively basic. It is the level that is learned earliest. Usually has a short name and is used frequently. It is the highest level at which a single mental image can reflect the category. Also, there is no definitive basic level for a hierarchy – it is dependent on the audience. Most of our knowledge is organized around basic level categories.

What is the Watson Ecosystem?

IBM launches cognitive computing cloud platform.

Cognitive computing is going mainstream

IBM is taking Watson and cognitive computing into the mainstream

The Watson Ecosystem empowers development of “Powered by IBM Watson” applications. Partners are building a community of organizations who share a vision for shaping the future of their industry through the power of cognitive computing. IBM’s cognitive computing cloud platform will help drive innovation and creative solutions to some of life’s most challenging problems. The ecosystem combines business partners’ experience, offerings, domain knowledge and presence with IBM’s technology, tools, brand, and marketing.

IBM offers a single source for developers to conceive and produce their Powered by Watson applications:

  • Watson Developer Cloud — will offer the technology, tools and APIs to ISVs for self-service training, development, and testing of their cognitive application. The Developer Cloud is expected to help jump-start and accelerate creation of Powered by IBM Watson applications.
  • Content Store — will bring together unique and varying sources of data, including general knowledge, industry specific content, and subject matter expertise to inform, educate, and help create an actionable experience for the user. The store is intended to be a clearinghouse of information presenting a unique opportunity for content providers to engage a new channel and bring their data to life in a whole new way.
  • Network — Staffing and talent organizations with access to in-demand skills like linguistics, natural language processing, machine learning, user experience design, and analytics will help bridge any skill gaps to facilitate the delivery of cognitive applications. .These talent hubs and their respective agents are expected to work directly with members of the Ecosystem on a fee or project basis.

How does cognitive computing differ from earlier artificial intelligence (AI)?

Cognitive computing systems learn and interact naturally with people to extend what either humans or machine could do on their own. In traditional AI, humans are not part of the equation. In cognitive computing, humans and machines work together. Rather than being programmed to anticipate every possible answer or action needed to perform a function or set of tasks, cognitive computing systems are trained using artificial intelligence (AI) and machine learning algorithms to sense, predict, infer and, in some ways, think.

Cognitive computing systems get better over time as they build knowledge and learn a domain – its language and terminology, its processes and its preferred methods of interacting. Unlike expert systems of the past which required rules to be hard coded into a system by a human expert, cognitive computers can process natural language and unstructured data and learn by experience, much in the same way humans do. While they’ll have deep domain expertise, instead of replacing human experts, cognitive computers will act as a decision support system and help them make better decisions based on the best available data, whether in healthcare, finance or customer service.

Smart solutions demand strong design think

IBM unveils new Design Studio to transform the way we interact with software and emerging technologies

IBM unveils new Design Studio to transform the way we interact with software and emerging technologies

The era of smart systems and cognitive computing is upon us. IBM’s product design studio in Austin, Texas will focus on how a new era of software will be designed, developed and consumed by organizations around the globe.

In addition to actively educating existing IBM team leads from engineering, design, and product management on new approaches to design, IBM is recruiting design experts and is engaging with leading design schools across the country to bring designers on board, including the Institute of Design at Stanford University, Rhode Island School of Design, Carnegie Mellon University, North Carolina State University, and Savannah College of Art & Design. Leading skill sets at the IBM Design Studio include Visual Design, Graphic artists, User Experience Designers, Design Developers, including Mobile developers, and Industrial designers.

Why is visualization important?

Patterns provide a 60% faster way to locate, navigate, and grasp meanings.

Examples of information visualization. Source: VisualComplexity

Examples of information visualization. Source: VisualComplexity

Information visualization technologies can enable most users to locate specific information they are looking for as much as 60 percent faster than with standard navigation methods.

Visualization techniques exploit multiple dimensions, e.g.:

  • 1D — Links, keywords lists, audio.
  • 2D — Taxonomies, facets, thesauri, trees, tables, charts, maps, diagrams, graphs, schematics typography, image
  • 2.5D — Layers, overlays, builds, multi-spaces, 2D animation, 2D navigation in time
  • 3D/4D — 3-dimensional models, characters, scenes, 3D animation, virtual worlds, synthetic worlds, and reality browsing.

What is visual language?

Words, images and shapes, tightly integrated into communication units.

Source: Robert Horn

Source: Robert Horn

Visual language is the tight integration of words, images, and shapes to produce a unified communication. It is a tool for creative problem solving, problem analysis, and a way of conveying ideas and communicating about the complexities of our technology and social institutions.

Visual language can be displayed on different media and different size communication units. Visual language is being created by the merger of vocabularies from many different fields as shown in the diagram above, from Robert Horn.

As the world increases in complexity, as the speed at which we need to solve business and social problems increases, as it becomes increasingly critical to have the “big picture” as well as multiple levels of detail immediately accessible, visual language will become more and more prevalent in our lives.

What’s coming next are semantic, knowledge-enabled tools for visual language. Computers will cease being mere electronic pencils, and be used to author, manage, and generate shared executable knowledge by means of patterns expressed through visual language.

The goal of language understanding


John F. Sowa

John Sowa is an American computer scientist, an expert in artificial intelligence and computer design, and the inventor of conceptual graphs. Over the past several years he has been developing a series of slides to overview key problems and challenges relating to the current state of language understanding by computer. You can download The Goal of Language Understanding  (November 15, 2013 version) here. The following topics are from the summary.

1. Problems and Challenges

Early hopes for artificial intelligence have not been realized. The task of understanding language as well as people do has proved to be far more difficult than anyone had thought. Research in all areas of cognitive science has uncovered more complexities in language than current theories can explain.A three-year-old child is better able to understand and generate language than any current computer system.


  • Have we been using the right theories, tools, and techniques?
  • Why haven’t these tools worked as well as we had hoped?
  • What other methods might be more promising?
  • What can research in neuroscience and psycholinguistics tell us?
  • Can it suggest better ways of designing intelligent systems?

2.  Psycholinguistics and Neuroscience

Brain areas involved in language processing

Brain areas involved in language processing

Language is a late development in evolutionary time. Systems of perception and action were highly developed long before some early hominin began to talk. People and higher mammals use the mechanisms of perception and action as the basis for mental models and reasoning. Language understanding and generation use those mechanisms.

Logic and mathematics are based on abstractions from language that use the same systems of perception and action. Language can express logic, but it does not depend on logic. Language is situated, embodied, distributed, and dynamic.

3. Semantics of Natural Languages

Human language is based on the way people think about everything they see, hear, feel, and do. And thinking is intimately integrated with perception and action. The semantics and pragmatics of a language are:

  • Situated in time and space,
  • Distributed in the brains of every speaker of the language,
  • Dynamically generated and interpreted in terms of a constantly developing and changing context,
  • Embodied and supported by the sensory and motor organs.

These points summarize current views by psycholinguists. Philosophers and logicians have debated other issues: e.g., NL as a formal logic; a sharp dichotomy between NL and logic; a continuum between NL and logic.

4. Ludwig Wittgenstein

Considered one of the greatest philosophers of the 20th century. Wrote his first book under the influence of Frege and Russell. That book had an enormous influence on analytic philosophy, formal ontology, and formal semantics of natural languages.

But Wittgenstein retired from philosophy to teach elementary school in an Austrian mountain village. In 1929, Russell and others persuaded him to return to Cambridge University, where he taught philosophy. During the 1930s, he began to rethink and criticize the foundations of his earlier book, including many ideas he had adopted from Frege and Russell.

5. Dynamics of Language and Reasoning

Natural languages adapt to the ever-changing phenomena of the world, the progress in science, and the social interactions of life.No computer system is as flexible as a human being in learning and responding to the dynamic aspects of language.

Three strategies for natural language processing (NLP):

  1. Neat: Define formal grammars with model-theoretic semantics that treat NL as a version of logic. Wittgenstein pioneered this strategy in his first book and became the sharpest critic of its limitations.
  2. Scruffy: Use heuristics to implement practical applications. Schank was the strongest proponent of this approach in the 1970s and ’80s.
  3. Mixed: Develop a framework that can use a mixture of neat and scruffy methods for specific applications.

NLP requires a dynamic foundation that can efficiently relate and integrate a wide range of neat, scruffy, and mixed methods.

6. Analogy and Case-Based Reasoning

Induction, Abduction, Deduction, and Action

Induction, Abduction, Deduction, and Action

Based on the same kind of pattern matching as perception:

  • Associative retrieval by matching patterns.
  • Approximate pattern matching for analogies and metaphors.
  • Precise pattern matching for logic and mathematics.

Analogies can support informal, case-based reasoning:

  • Long-term memory can store large numbers of previous experiences.
  • Any new case can be matched to similar cases in long-term memory.
  • Close matches are ranked by a measure of semantic distance.

Formal reasoning is based on a disciplined use of analogy:

  • Induction: Generalize multiple cases to create rules or axioms.
  • Deduction: Match (unify) a new case with part of some rule or axiom.
  • Abduction: Form a hypothesis based on aspects of similar cases.

7. Learning by Reading

Perfect understanding of natural language is an elusive goal:

  • Even native speakers don’t understand every text in their language.
  • Without human bodies and feelings, computer models will always be imperfect approximations to human thought.

For technical subjects, computer models can be quite good:

  • Subjects that are already formalized, such as mathematics and computer programs, are ideal for computer sytems.
  • Physics is harder, because the applications require visualization.
  • Poetry and jokes are the hardest to understand.

But NLP systems can learn background knowledge by reading:

  • Start with a small, underspecified ontology of the subject.
  • Use some lexical semantics, especially for the verbs.
  • Read texts to improve the ontology and the lexical semantics.
  • The primary role for human tutors is to detect and correct errors.

The Process of Language Understanding

People relate patterns in language to patterns in mental models. Simulating exactly what people do is impossible today:

  • Nobody knows the details of how the brain works.
  • Even with a good theory of the brain, the total amount of detail would overwhelm the fastest supercomputers.
  • A faithful simulation would also require a detailed model of the body with all its mechanisms of perception, feelings, and action.

But efficient approximations to human patterns are possible:

  • Graphs can specify good approximations to continuous models.
  • They can serve as the logical notation for a dynamic model theory.
  • And they can support a high-speed associative memory.

This engineering approach is influenced by, but is not identical to the cognitive organization and processing in the human brain.

Search to knowing

The spectrum of knowledge representation and reasoning

More expressive knowledge representation enables more powerful reasoning

More expressive knowledge representation powers greater reasoning capability.

Not all knowledge representation is the same. This figure shows a spectrum of executable knowledge representation and reasoning capabilities. As the rigor and expressive power of the semantics and knowledge representation increases, so does the value of the reasoning capacity it enables.

From bottom-to-top, the amount, kinds, and complexity, and expressive power knowledge
representation increases.From left-to-right, reasoning capabilities advance from:
(a) Information recovery based on linguistic and statistical methods, to
(b) Discovery of unexpected relevant information and associations through mining, to
(c) Intelligence
based on correlation of data sources, connecting the dots, and putting information into context, to
(d) Question answering ranging from simple factoids to complex decision-support, to
(e) Smart behaviors including robust adaptiveand autonomous action.

Moving from lower left to upper right, the diagram depicts a spectrum of progressively more capable forms of knowledge representation together with standards and formalisms used to express metadata, associations, models, contexts, and modes of reasoning. More expressive forms of metadata and semantic modeling encompass the simpler forms, and extend their capabilities. In the following topics, we discuss different forms of knowledge representation,then the types of reasoning capabilities they enable.

What is knowledge representation?

Knowledge representation is the application of theory, values, logic, and ontology to the task of constructing computable patterns of some domain.

The future is n-ary concept encoding.

The future of knowledge representation is n-ary concept encoding.

Knowledge is “captured and preserved”, when it is transformed into a perceptible and manipulable system of representation.

Systems of knowledge representation differ in their fidelity, intuitiveness, complexity, and rigor. The computational theory of knowledge predicts that ultimate economies and efficiencies can be achieved through variable-length n-ary concept coding and pattern reasoning resulting in designs that are linear and proportional to knowledge measure.

“Semantic networks” (entity-relationship) are the most powerful and general form for knowledge representation. They model knowledge as a nodal mesh of mental concepts and physical entities (boxes, circles, etc.) tied by constraining relationships (arrows, directed lines). Relationships describe “constraints” on concepts including: (a) logical constraints — prepositions of direction or proximity, action verbs connecting subject to object, etc., and (b) reality constraints — linking concepts to their time, image, attributes, or perceptible measures.

Physical knowledge is Information, or the a posteriori constraints of spatial-temporal reality. It includes sense data / measurements, observed or recorded independently — often dependent on time, place or conditions observed. Information representations include: numbers and units, tables of measurement, statistics, data bases, language, drawings, photographic images.

Metaphysical knowledge is rational structure, or the a priori constraint of mental concepts & perceived relationships, dictated by axiology, accepted theory, logic, and conditioned expectation — expressed as truth, correctness, and self-consistency — usually independent of time, place, or a particular reality. Representations include computer programs, rules, E-R diagrams, language, symbols, formula, algorithms, recipes, ontologies.

Axiology trumps logic

What is value?

The measure of the worth or desirability of something.
The foundation of meaning.

Value is the foundation of meaning.
It is the measure of the worth or desirability (positive or negative) of something, and of how well something conforms to its concept or intension. Value formation and value-based reasoning are fundamental to all areas of human endeavor. Theories embody values. The axiom of value is based on “concept fulfillment.”

Most areas of human reasoning require application of multiple theories; resolution of conflicts, uncertainties, competing values, and analysis of trade-offs. For example, questions of guilt or innocence require judgment of far more than logical truth or falsity.

Axiology is the branch of philosophy that studies value and value theory.
Things like honesty, truthfulness, objectiveness, novelty, originality, “progress,” people satisfaction, etc. The word ‘axiology’, derived from two Greek roots ‘axios’ (worth or value) and ‘logos’ (logic or theory), means the theory of value, and concerns the process of understanding values and valuation.