MILLS•DAVIS weblog page 3

Semantic Verses

Semantic capabilities for desktop and mobile apps.

Semantic capabilities for desktop and mobile apps.

I had an interesting conversation with Dr. Walid Saba about semantic search, enrichment,  summarization, and recommendation capabilities that he and his colleagues have been developing at Magnet. As he describes it, the basic issues they are addressing can be described this way:

Why is the retrieval of semantically/topically relevant information difficult?

Two reasons:

  • Bad (semantic) precision — A document might mention a phrase several times, although the document is not at all about that topic.
  • Bad (semantic) recall — A document might never mention a phrase explicitly, but it is essentially about a semantically/topically related topic.

The essential problem is how do we determine what a certain document is (semantically, or topically) about, regardless of the actual words being used. To do this, we must go from words to meanings.

First, we must perform word-sense disambiguation with very high accuracy. This involves recognizing named entities.

Second, using some concept algebra, we must make topics (compound meanings) from simpler ones, for example: “Android phones” “Apple devices”,“Amazon.com’s web site”, “text message”

Third, we must go from topics to key topics. We understand what a document is essentially about when we can determine the set of key topics. Semantic Verses does this by identifying a potentially infinite set of topics, as opposed to a pre-engineered ontology of a finite set of topics. This enables semantically comparing topics written using completely different sets of words across languages and across media.

As highlighted in the figure above, Dr. Saba and his team are developing a number tools for individuals and businesses to tap these semantic capabilities. This includes plug-ins for browsers; plug-ins for MS Word and Powerpoint; and tools for blogs as well as an API.

The underlying semantic (concept computing) engine runs quickly on a single node e.g.: 10 queries  per second on a database of 10 million documents.  No training required to get started.  On a single node it can process 50 documents per second. Its dynamic index has a small footprint. Adding to it is real-time and does not require re-clustering or re-indexing.

Dr. Walid Saba

Dr. Walid Saba

Seven examples of the business value of ontologies

Each year at the National Institutes of Standards and Technology, the Ontolog Forum brings together a community of  researchers, educators, and practitioners to discuss the role of ontologies in next generation solutions.  This presentation highlights seven case examples showing how ontologies deliver business value.

 

The goal of language understanding

JohnSowa_20070712

John F. Sowa

John Sowa is an American computer scientist, an expert in artificial intelligence and computer design, and the inventor of conceptual graphs. Over the past several years he has been developing a series of slides to overview key problems and challenges relating to the current state of language understanding by computer. You can download The Goal of Language Understanding  (November 15, 2013 version) here. The following topics are from the summary.

1. Problems and Challenges

Early hopes for artificial intelligence have not been realized. The task of understanding language as well as people do has proved to be far more difficult than anyone had thought. Research in all areas of cognitive science has uncovered more complexities in language than current theories can explain.A three-year-old child is better able to understand and generate language than any current computer system.

Questions:

  • Have we been using the right theories, tools, and techniques?
  • Why haven’t these tools worked as well as we had hoped?
  • What other methods might be more promising?
  • What can research in neuroscience and psycholinguistics tell us?
  • Can it suggest better ways of designing intelligent systems?

2.  Psycholinguistics and Neuroscience

Brain areas involved in language processing

Brain areas involved in language processing

Language is a late development in evolutionary time. Systems of perception and action were highly developed long before some early hominin began to talk. People and higher mammals use the mechanisms of perception and action as the basis for mental models and reasoning. Language understanding and generation use those mechanisms.

Logic and mathematics are based on abstractions from language that use the same systems of perception and action. Language can express logic, but it does not depend on logic. Language is situated, embodied, distributed, and dynamic.

3. Semantics of Natural Languages

Human language is based on the way people think about everything they see, hear, feel, and do. And thinking is intimately integrated with perception and action. The semantics and pragmatics of a language are:

  • Situated in time and space,
  • Distributed in the brains of every speaker of the language,
  • Dynamically generated and interpreted in terms of a constantly developing and changing context,
  • Embodied and supported by the sensory and motor organs.

These points summarize current views by psycholinguists. Philosophers and logicians have debated other issues: e.g., NL as a formal logic; a sharp dichotomy between NL and logic; a continuum between NL and logic.

4. Ludwig Wittgenstein

Considered one of the greatest philosophers of the 20th century. Wrote his first book under the influence of Frege and Russell. That book had an enormous influence on analytic philosophy, formal ontology, and formal semantics of natural languages.

But Wittgenstein retired from philosophy to teach elementary school in an Austrian mountain village. In 1929, Russell and others persuaded him to return to Cambridge University, where he taught philosophy. During the 1930s, he began to rethink and criticize the foundations of his earlier book, including many ideas he had adopted from Frege and Russell.

5. Dynamics of Language and Reasoning

Natural languages adapt to the ever-changing phenomena of the world, the progress in science, and the social interactions of life.No computer system is as flexible as a human being in learning and responding to the dynamic aspects of language.

Three strategies for natural language processing (NLP):

  1. Neat: Define formal grammars with model-theoretic semantics that treat NL as a version of logic. Wittgenstein pioneered this strategy in his first book and became the sharpest critic of its limitations.
  2. Scruffy: Use heuristics to implement practical applications. Schank was the strongest proponent of this approach in the 1970s and ’80s.
  3. Mixed: Develop a framework that can use a mixture of neat and scruffy methods for specific applications.

NLP requires a dynamic foundation that can efficiently relate and integrate a wide range of neat, scruffy, and mixed methods.

6. Analogy and Case-Based Reasoning

Induction, Abduction, Deduction, and Action

Induction, Abduction, Deduction, and Action

Based on the same kind of pattern matching as perception:

  • Associative retrieval by matching patterns.
  • Approximate pattern matching for analogies and metaphors.
  • Precise pattern matching for logic and mathematics.

Analogies can support informal, case-based reasoning:

  • Long-term memory can store large numbers of previous experiences.
  • Any new case can be matched to similar cases in long-term memory.
  • Close matches are ranked by a measure of semantic distance.

Formal reasoning is based on a disciplined use of analogy:

  • Induction: Generalize multiple cases to create rules or axioms.
  • Deduction: Match (unify) a new case with part of some rule or axiom.
  • Abduction: Form a hypothesis based on aspects of similar cases.

7. Learning by Reading

Perfect understanding of natural language is an elusive goal:

  • Even native speakers don’t understand every text in their language.
  • Without human bodies and feelings, computer models will always be imperfect approximations to human thought.

For technical subjects, computer models can be quite good:

  • Subjects that are already formalized, such as mathematics and computer programs, are ideal for computer sytems.
  • Physics is harder, because the applications require visualization.
  • Poetry and jokes are the hardest to understand.

But NLP systems can learn background knowledge by reading:

  • Start with a small, underspecified ontology of the subject.
  • Use some lexical semantics, especially for the verbs.
  • Read texts to improve the ontology and the lexical semantics.
  • The primary role for human tutors is to detect and correct errors.

The Process of Language Understanding

People relate patterns in language to patterns in mental models. Simulating exactly what people do is impossible today:

  • Nobody knows the details of how the brain works.
  • Even with a good theory of the brain, the total amount of detail would overwhelm the fastest supercomputers.
  • A faithful simulation would also require a detailed model of the body with all its mechanisms of perception, feelings, and action.

But efficient approximations to human patterns are possible:

  • Graphs can specify good approximations to continuous models.
  • They can serve as the logical notation for a dynamic model theory.
  • And they can support a high-speed associative memory.

This engineering approach is influenced by, but is not identical to the cognitive organization and processing in the human brain.

Understanding our digital universe

Expanding Universe

Transformation is and always has been a pervasive property of our universe.


How is the Digital Universe expanding?

Dr. Michael Brodie has an interesting take on trends we are struggling with this decade.

Check out this tutorial:
It articulates key themes of our rapidly expanding “Digital Universe.” The talk summary reads as follows:

“This is a remarkable time in human history. Our real world is rapidly becoming digital and our digital worlds are rapidly becoming real. Ubiquitous digital worlds such as online shopping and auctions, stock and equity trading systems, electronic banking, social networks, and the e-‘s (e-government, e-health, e-commerce, e-business) contribute to our rapidly expanding Digital Universe that is as fascinating in the 21st Century as the physical universe was in the 20th Century. Digital worlds have an enormous, and far from understood, impact on our real world and vice versa. The growth, adoption, and power of these digital worlds and the amazing opportunities and threats that they offer suggest a forthcoming Digital Industrial Revolution. The Digital Industrial Revolution, accelerated by the Web, will have far reaching effect, as did the Industrial Revolution, accelerated by printing press. Both revolutions unleashed natural social, economic, and political forces and both flattened the world through transparency and openness. But the Digital Industrial revolution, because of the phenomena surrounding the Web and its interactions with society, is occurring at lightning speed with profound impacts on society, the economy, politics, and more.

Our Digital Universe is leading to fundamental changes in human endeavors – how people interact, how science and business is conducted, and how governments operate, leading in turn to planned and unforeseen consequences such as universal and instantaneous access to information and other resources, globalization of enterprises and industries, as well as economic and social crises, and threats to security and civil liberties. No longer do computer systems provide back-office, administrative support; they are emerging as platforms for digital ecosystems of automated and human agents that operate real business, social, and government processes; thus creating digital worlds that are an integral part of our real world. Yet we build them with little understanding of these digital worlds or their impacts on our real world. Stated simply, the Web is unleashing natural social, economic, political, and other forces – for good and for ill.

This talk explores our expanding Digital Universe that has been emerging slowly for half a century but has reached a tipping point due to the convergence of technical and world trends such as the Web and its continuously astounding adoption. We investigate key contributors to this remarkable time of change and transformation. Digital worlds are being used to transform social, business, scientific, and government activities creating the potential that we can redefine our world. But how do we redefine our world? Where do we start? We look at the need for fundamentally new methods to understand our digital worlds and their actual and potential interactions with and impacts on our world; and for the conception, design, development, and use of digital worlds (previously called “applications” presumable of computing) and the real and digital worlds with which they interact. Since the problems being addressed are real, so is the problem solving. No more “boffins in the back room”. The related problem solving methods must be holistic, multi-disciplinary, and collaborative and that facilitate problem solving across technical, social, and other domains to develop secure, realistic, and robust digital worlds. The need for such methods is illustrated with a healthcare information system failure costing £12.4 billion and a corresponding success due largely to its multi-disciplinary life cycle. We examine examples of these methods by applying Jarvis’s Google Rules to failing real worlds and their growing digital counterparts. The emergence of our Digital Universe and its impact on and potential for our world raises the challenge to aspire to the principles of Web Science to work collaboratively across relevant disciplines to create digital worlds that contribute to improving our world.”


 

The power and limitations of relational database technology in the age of information ecosystems

Dr. Michael Brody

Dr. Michael Brody is concerned with the Big Picture, including business, economic, social, application, and technical aspects of information ecosystems, core technologies, and integration. He has served as Chief Scientist of a Fortune 20 company, an Advisory Board member of leading national and international research organizations, and an invited speaker and lecturer. Dr. Brodie researches and analyzes challenges and opportunities in advanced technology, architecture, and methodologies for Information Technology strategies. The following is a link to a presentation in which Dr.  Brody examines trends in data management, integration at scale, and information ecosystems.

How do humans encode thoughts, represent knowledge, and share meanings?

Using patterns and language.

Saul Steinberg — Labrynth

Saul Steinberg — Labrynth

Patterns are knowledge units. A pattern is a compact and rich in semantics representation of raw data. Semantic richness is the knowledge a pattern reveals that is hidden in the huge quantity of data it represents. Compactness is the correlations among data and the synthetic, high level description of data characteristics. For example, an image.

Language is a system of signs, symbols, gestures, and rules used in communicating. Meaning is something that is conveyed or signified. Humans have plenty of experience encoding thoughts and meanings using language in one form or another… Our proficiency varies. We tend to be better at some kinds of language, and not so good at others.

Five forms of human language

Five forms of human language

Human endeavors often combine different skills and expertise, e.g. to make a movie; design and construct a building; or coordinate response to an emergency. The following table gives examples of five forms of human language: natural, visual, formal, behavioral, and sensory language.

Industrial giants placing big bets on smart technologies and concept computing

GE’s vision of the industrial internet

GE’s vision of the industrial internet

In the fall of 2012 General Electric came out with a study predicting huge economic growth resulting from the Industrial Internet.  The two authors are GE’s top strategist and chief economist. It’s a serious report.

Here is the thesis. Mechanization of work over the past 200 years has resulted in a 50X worker productivity increase.  The next stage is the integration of machines with computing and the Internet. The result, they predict will be tens of trillions of dollars in economic expansion and improved quality of life worldwide.

Industrial internet fuels global economic expansion.

Industrial internet fuels global economic expansion.

Here are two slides from the GE report.

The one on the left identifies three key elements of the industrial internet.  The implication is that patterns of work will change and that industrial products and processes will gain a cradle to sunset life history.

The diagram to the right projects the value of the industrial internet in the form of potential performance gains across five economic sectors. This is a minimal projection, the power of 1 percent, but we’re still talking $ billions. GE’s overall projection for industrial internet fueled economic expansion to 2030 is closer to $40 trillion.

Agent Smith (from the Matrix) helping GE promote smart technologies.

Agent Smith (from the Matrix) helping GE promote smart technologies.

During 2013 GE began taking its industrial internet thesis to the street. Their recent TV commercials bring back Agent Smith from the Matrix. This scene is about the interconnection and intelligent interaction of machines, software, and healthcare professionals to deliver improved outcomes for patients — a waiting room becomes, just a room.

One version of the ad ends with agent Smith offering a child a choice of lollypops — a red one or a blue one.

How is knowledge different from content?

Content is merely an expression of knowledge.

Source: Pieter Bruegel, Tower of Babel

Source: Pieter Bruegel, Tower of Babel

Content is not knowledge. Knowledge can be separated from content, just as content can be separated from format. The difference between content and knowledge is the difference between form and substance. Content is always specific. Knowledge is always generic and structural. The same knowledge can take many forms. Content is a language, audience, media, and situation-specific rendition of knowledge.

Search to knowing

The spectrum of knowledge representation and reasoning

More expressive knowledge representation enables more powerful reasoning

More expressive knowledge representation powers greater reasoning capability.

Not all knowledge representation is the same. This figure shows a spectrum of executable knowledge representation and reasoning capabilities. As the rigor and expressive power of the semantics and knowledge representation increases, so does the value of the reasoning capacity it enables.

From bottom-to-top, the amount, kinds, and complexity, and expressive power knowledge
representation increases.From left-to-right, reasoning capabilities advance from:
(a) Information recovery based on linguistic and statistical methods, to
(b) Discovery of unexpected relevant information and associations through mining, to
(c) Intelligence
based on correlation of data sources, connecting the dots, and putting information into context, to
(d) Question answering ranging from simple factoids to complex decision-support, to
(e) Smart behaviors including robust adaptiveand autonomous action.

Moving from lower left to upper right, the diagram depicts a spectrum of progressively more capable forms of knowledge representation together with standards and formalisms used to express metadata, associations, models, contexts, and modes of reasoning. More expressive forms of metadata and semantic modeling encompass the simpler forms, and extend their capabilities. In the following topics, we discuss different forms of knowledge representation,then the types of reasoning capabilities they enable.

What is knowledge representation?

Knowledge representation is the application of theory, values, logic, and ontology to the task of constructing computable patterns of some domain.

The future is n-ary concept encoding.

The future of knowledge representation is n-ary concept encoding.

Knowledge is “captured and preserved”, when it is transformed into a perceptible and manipulable system of representation.

Systems of knowledge representation differ in their fidelity, intuitiveness, complexity, and rigor. The computational theory of knowledge predicts that ultimate economies and efficiencies can be achieved through variable-length n-ary concept coding and pattern reasoning resulting in designs that are linear and proportional to knowledge measure.

“Semantic networks” (entity-relationship) are the most powerful and general form for knowledge representation. They model knowledge as a nodal mesh of mental concepts and physical entities (boxes, circles, etc.) tied by constraining relationships (arrows, directed lines). Relationships describe “constraints” on concepts including: (a) logical constraints — prepositions of direction or proximity, action verbs connecting subject to object, etc., and (b) reality constraints — linking concepts to their time, image, attributes, or perceptible measures.

Physical knowledge is Information, or the a posteriori constraints of spatial-temporal reality. It includes sense data / measurements, observed or recorded independently — often dependent on time, place or conditions observed. Information representations include: numbers and units, tables of measurement, statistics, data bases, language, drawings, photographic images.

Metaphysical knowledge is rational structure, or the a priori constraint of mental concepts & perceived relationships, dictated by axiology, accepted theory, logic, and conditioned expectation — expressed as truth, correctness, and self-consistency — usually independent of time, place, or a particular reality. Representations include computer programs, rules, E-R diagrams, language, symbols, formula, algorithms, recipes, ontologies.