Hire a web Developer and Designer to upgrade and boost your online presence with cutting edge Technologies

Sunday, 30 April 2023

Few-shot learning with GPT-J and GPT-Neo

 Hello!

Since I added the GPT-J and GPT-Neo endpoints to NLPCloud.io, I've had many questions about how to make the most of these 2 great NLP models.
So I just wrote an article about few-shot learning with GPT-J and GPT-Neo: a simple technique to dramatically improve accuracy:
https://nlpcloud.io/effectively-using-gpt-j-gpt-neo-gpt-3-alternatives-few-shot-learning.html

Few-shot learning is about helping a machine learning model make predictions thanks to only a couple of examples. No need to train a new model here: models like GPT-J and GPT-Neo are so big that they can easily adapt to many contexts without being re-trained.

Thanks to this technique, I'm showing how you can easily perform things like sentiment analysis, code generation, tutorial generation, machine translation, spell correction, question answering, tweet creation…

I personally find it amazing what can be done with these NLP models. It seems that only our imagination is the limit!

Hope you'll find it useful.

Shitloads and zingers: on the perils of machine translation

 Years ago, on a flight from Amsterdam to Boston, two American nuns seated to my right listened to a voluble young Dutchman who was out to discover the United States. He asked the nuns where they were from. Alas, Framingham, Massachusetts was not on his itinerary, but, he noted, he had ‘shitloads of time and would be visiting shitloads of other places’.

The jovial young Dutchman had apparently gathered that ‘shitloads’ was a colourful synonym for the bland ‘lots’. He had mastered the syntax of English and a rather extensive vocabulary but lacked experience of the appropriateness of words to social contexts.

This memory sprang to mind with the recent news that the Google Translate engine would move from a phrase-based system to a neural network. (The technical differences are described here.) Both methods rely on training the machine with a ‘corpus’ consisting of sentence pairs: an original and a translation. The computer then generates rules for inferring, based on the sequence of words in the original text, the most likely sequence of words from the target language.

The procedure is an exercise in pattern matching. Similar pattern-matching algorithms are used to interpret the syllables you utter when you ask your smartphone to ‘navigate to Brookline’ or when a photo app tags your friend’s face. The machine doesn’t ‘understand’ faces or destinations; it reduces them to vectors of numbers, and processes them.

I am a professional translator, having translated some 125 books from the French. One might therefore expect me to bristle at Google’s claim that its new translation engine is almost as good as a human translator, scoring 5.0 on a scale of 0 to 6, whereas humans average 5.1. But I’m also a PhD in mathematics who has developed software that ‘reads’ European newspapers in four languages and categorises the results by topic. So, rather than be defensive about the possibility of being replaced by a machine translator, I am aware of the remarkable feats of which machines are capable, and full of admiration for the technical complexity and virtuosity of Google’s work.

My admiration does not blind me to the shortcomings of machine translation, however. Think of the young Dutch traveller who knew ‘shitloads’ of English. The young man’s fluency demonstrated that his ‘wetware’ – a living neural network, if you will – had been trained well enough to intuit the subtle rules (and exceptions) that make language natural. Computer languages, on the other hand, have context-free grammars. The young Dutchman, however, lacked the social experience with English to grasp the subtler rules that shape the native speaker’s diction, tone and structure. The native speaker might also choose to break those rules to achieve certain effects. If I were to say ‘shitloads of places’ rather than ‘lots of places’ to a pair of nuns, I would mean something by it. The Dutchman blundered into inadvertent comedy.

Google’s translation engine is ‘trained’ on corpora ranging from news sources to Wikipedia. The bare description of each corpus is the only indication of the context from which it arises. From such scanty information it would be difficult to infer the appropriateness or inappropriateness of a word such as ‘shitloads’. If translating into French, the machine might predict a good match to beaucoup or plusieurs. This would render the meaning of the utterance but not the comedy, which depends on the socially marked ‘shitloads’ in contrast to the neutral plusieurs. No matter how sophisticated the algorithm, it must rely on the information provided, and clues as to context, in particular social context, are devilishly hard to convey in code.

Take the French petite phrase. Phrase can mean ‘sentence’ or ‘phrase’ in English. When Marcel Proust uses it in a musical context in his novel À la recherche du temps perdu (1913-27), in the line ‘la petite phrase de Vinteuil’, it has to be ‘phrase’, because ‘sentence’ makes no sense. Google Translate (the old phrase-based system; the new neutral network is as yet available only for Mandarin Chinese) does remarkably well with this. If you put in petite phrase alone, it gives ‘short sentence’. If you put in la petite phrase de Vinteuil (Vinteuil being the name of a character who happens to be a composer), it gives ‘Vinteuil’s little phrase’, echoing published Proust translations. The rarity of the name ‘Vinteuil’ provides the necessary context, which the statistical algorithm picks up. But if you put in la petite phrase de Sarkozy, it spits out ‘little phrase Sarkozy’ instead of the correct ‘Sarkozy’s zinger’ – because in the political context indicated by the name of the former president, une petite phrase is a barbed remark aimed at a political rival – a zinger rather than a musical phrase. But the name Sarkozy appears in such a variety of sentences that the statistical engine fails to register it properly – and then compounds the error with an unfortunate solecism.

The problem, as with all previous attempts to create artificial intelligence (AI) going back to my student days at MIT, is that intelligence is incredibly complex. To be intelligent is not merely to be capable of inferring logically from rules or statistically from regularities. Before that, one has to know which rules are applicable, an art requiring awareness of sensitivity to situation. Programmers are very clever, but they are not yet clever enough to anticipate the vast variety of contexts from which meaning emerges. Hence even the best algorithms will miss things – and as Henry James put it, the ideal translator must be a person ‘on whom nothing is lost’.

This is not to say that mechanical translation is not useful. Much translation work is routine. At times, machines can do an adequate job. Don’t expect miracles, however, or felicitous literary translations, or aptly rendered political zingers. Overconfident claims have dogged AI research from its earliest days. I don’t say this out of fear for my job: I’ve retired from translating and am devoting part of my time nowadays to… writing code.

Saturday, 29 April 2023

Murder in virtual reality should be illegal

 You start by picking up the knife, or reaching for the neck of a broken-off bottle. Then comes the lunge and wrestle, the physical strain as your victim fights back, the desire to overpower him. You feel the density of his body against yours, the warmth of his blood. Now the victim is looking up at you, making eye contact in his final moments.

Science-fiction writers have fantasised about virtual reality (VR) for decades. Now it is here – and with it, perhaps, the possibility of the complete physical experience of killing someone, without harming a soul. As well as Facebook’s ongoing efforts with Oculus Rift, Google recently bought the eye-tracking start-up Eyefluence, to boost its progress towards creating more immersive virtual worlds. The director Alejandro G Iñárritu and the cinematographer Emmanuel Lubezki, both famous for Birdman (2014) and The Revenant (2015), have announced that their next project will be a short VR film.

But this new form of entertainment is dangerous. The impact of immersive virtual violence must be questioned, studied and controlled. Before it becomes possible to realistically simulate the experience of killing someone, murder in VR should be made illegal.

This is not the argument of a killjoy. As someone who has worked in film and television for almost 20 years, I am acutely aware that the craft of filmmaking is all about maximising the impact on the audience. Directors ask actors to change the intonation of a single word, while editors sweat over a film cut down to fractions of a second, all in pursuit of the right mood and atmosphere.

So I understand the appeal of VR, and its potential to make a story all the more real for the viewer. But we must examine that temptation in light of the fact that both cinema and gaming thrive on stories of conflict and resolution. Murder and violence are a mainstay of our drama, while single-person shooters are one of the most popular segments of the games industry.

The effects of all this gore are not clear-cut. Crime rates in the United States have fallen even as Hollywood films have become bloodier and violent video games have grown in popularity. Some research suggests that shooter games can be soothing, while other studies indicate they might be a causal risk factor in violent behaviour. (Perhaps, as for Frank Underwood in the Netflix series House of Cards (2013-), it’s possible for video games to be both those things.) Students who played violent games for just 20 minutes a day, three days in a row, were more aggressive and less empathetic than those who didn’t, according to research by the psychologist Brad Bushman at Ohio State University and his team. The repeated actions, interactivity, assuming the position of the aggressor, and the lack of negative consequences for violence, are all aspects of the gaming experience that amplify aggressive behaviour, according to research by the psychologists Craig Anderson at Iowa State University and Wayne Warburton at Macquarie University in Sydney. Mass shooters including Aaron Alexis, Adam Lanza and Anders Breivik were all obsessive gamers.

The problem of what entertainment does to us isn’t new. The morality of art has been a matter of debate since Plato. The philosopher Jean-Jacques Rousseau was skeptical of the divisive and corrupting potential of theatre, for example, with its passive audience in their solitary seats. Instead, he promoted participatory festivals that would cement community solidarity, with lively rituals to unify the jubilant crowd. But now, for the first time, technology promises to explode the boundary between the world we create through artifice and performance, and the real world as we perceive it, flickering on the wall of Plato’s cave. And the consequences of such immersive participation are complex, uncertain and fraught with risk.

Humans are embodied beings, which means that the way we think, feel, perceive and behave is bound up with the fact that we exist as part of and within our bodies. By hijacking our capacity for proprioception – that is, our ability to discern states of the body and perceive it as our own – VR can increase our identification with the character we’re playing. The ‘rubber hand illusion’ showed that, in the right conditions, it’s possible to feel like an inert prosthetic appendage is a real hand; more recently, a 2012 study found that people perceived a distorted virtual arm, stretched up to three times its ordinary length, to still be a part of their body.

It’s a small step from here to truly inhabiting the body of another person in VR. But the consequences of such complete identification are unknown, as the German philosopher Thomas Metzinger has warned. There is the risk that virtual embodiment could bring on psychosis in those who are vulnerable to it, or create a sense of alienation from their real bodies when they return to them after a long absence. People in virtual environments tend to conform to the expectations of their avatar, Metzinger says. A study by Stanford researchers in 2007 dubbed this ‘the Proteus effect’: they found that people who had more attractive virtual characters were more willing to be intimate with other people, while those assigned taller avatars were more confident and aggressive in negotiations. There’s a risk that this behaviour, developed in the virtual realm, could bleed over into the real one.

In an immersive virtual environment, what will it be like to kill? Surely a terrifying, electrifying, even thrilling experience. But by embodying killers, we risk making violence more tantalising, training ourselves in cruelty and normalising aggression. The possibility of building fantasy worlds excites me as a filmmaker – but, as a human being, I think we must be wary. We must study the psychological impacts, consider the moral and legal implications, even establish a code of conduct. Virtual reality promises to expand the range of forms we can inhabit and what we can do with those bodies. But what we physically feel shapes our minds. Until we understand the consequences of how violence in virtual reality might change us, virtual murder should be illegal.

Friday, 28 April 2023

Coding is not ‘fun’, it’s technically and ethically complex

 Programming computers is a piece of cake. Or so the world’s digital-skills gurus would have us believe. From the non-profit Code.org’s promise that ‘Anybody can learn!’ to Apple chief executive Tim Cook’s comment that writing code is ‘fun and interactive’, the art and science of making software is now as accessible as the alphabet.

Unfortunately, this rosy portrait bears no relation to reality. For starters, the profile of a programmer’s mind is pretty uncommon. As well as being highly analytical and creative, software developers need almost superhuman focus to manage the complexity of their tasks. Manic attention to detail is a must; slovenliness is verboten. Attaining this level of concentration requires a state of mind called being ‘in the flow’, a quasi-symbiotic relationship between human and machine that improves performance and motivation.

Coding isn’t the only job that demands intense focus. But you’d never hear someone say that brain surgery is ‘fun’, or that structural engineering is ‘easy’. When it comes to programming, why do policymakers and technologists pretend otherwise? For one, it helps lure people to the field at a time when software (in the words of the venture capitalist Marc Andreessen) is ‘eating the world’ – and so, by expanding the labour pool, keeps industry ticking over and wages under control. Another reason is that the very word ‘coding’ sounds routine and repetitive, as though there’s some sort of key that developers apply by rote to crack any given problem. It doesn’t help that Hollywood has cast the ‘coder’ as a socially challenged, type-first-think-later hacker, inevitably white and male, with the power to thwart the Nazis or penetrate the CIA.

Insisting on the glamour and fun of coding is the wrong way to acquaint kids with computer science. It insults their intelligence and plants the pernicious notion in their heads that you don’t need discipline in order to progress. As anyone with even minimal exposure to making software knows, behind a minute of typing lies an hour of study.

It’s better to admit that coding is complicated, technically and ethically. Computers, at the moment, can only execute orders, to varying degrees of sophistication. So it’s up to the developer to be clear: the machine does what you say, not what you mean. More and more ‘decisions’ are being entrusted to software, including life-or-death ones: think self-driving cars; think semi-autonomous weapons; think Facebook and Google making inferences about your marital, psychological or physical status, before selling it to the highest bidder. Yet it’s rarely in the interests of companies and governments to encourage us to probe what’s going on beneath these processes.

All of these scenarios are built on exquisitely technical foundations. But we can’t respond to them by answering exclusively technical questions. Programming is not a detail that can be left to ‘technicians’ under the false pretence that their choices will be ‘scientifically neutral’. Societies are too complex: the algorithmic is political. Automation has already dealt a blow to the job security of low-skilled workers in factories and warehouses around the world. White-collar workers are next in line. The digital giants of today run on a fraction of the employees of the industrial giants of yesterday, so the irony of encouraging more people to work as programmers is that they are slowly mobilising themselves out of jobs.

In an ever-more intricate and connected world, where software plays a larger and larger role in everyday life, it’s irresponsible to speak of coding as a lightweight activity. Software is not simply lines of code, nor is it blandly technical. In just a few years, understanding programming will be an indispensable part of active citizenship. The idea that coding offers an unproblematic path to social progress and personal enhancement works to the advantage of the growing techno-plutocracy that’s insulating itself behind its own technology.

Thursday, 27 April 2023

AI:List the toxic and heavy metals besides arsenic which are contaminating the groundwater,landsites and air around the globe. Which AI techniques are available clear up those pollutants and contaminants from every medium, purifyng it and create many revenue making substances , using both scientific and naturalmethods in real time.

 Besides arsenic, several other toxic and heavy metals contaminate groundwater, landsites, and air around the globe. Some of the most common ones include:

  1. Lead
  2. Mercury
  3. Cadmium
  4. Chromium
  5. Nickel
  6. Copper
  7. Zinc
  8. Aluminum
  9. Iron
  10. Manganese

These metals can leach into the environment from various sources such as industrial activities, mining, agriculture, and improper waste disposal, posing significant health and environmental risks.

To address these pollutants and contaminants, various AI techniques can be employed, including:

  1. Machine Learning for Pollution Prediction: Machine learning algorithms can analyze historical data on pollution levels and environmental factors to predict areas at high risk of contamination. This information can help prioritize cleanup efforts and preventive measures.

  2. Remote Sensing and Satellite Imaging: Remote sensing technologies, coupled with AI algorithms, can monitor large areas of land and water bodies to detect pollution hotspots and track changes in environmental conditions over time.

  3. Sensor Networks and IoT Devices: AI-powered sensor networks and Internet of Things (IoT) devices can provide real-time monitoring of air and water quality, allowing for early detection of pollutants and prompt action to mitigate contamination.

  4. Bioremediation and Phytoextraction: AI algorithms can optimize bioremediation techniques, such as using microorganisms or plants to break down or absorb contaminants from soil and water. AI can help identify the most effective remediation strategies for specific pollutants and environmental conditions.

  5. Chemical Treatment Processes: AI can optimize chemical treatment processes for removing heavy metals from water and soil, such as precipitation, ion exchange, and adsorption. AI algorithms can optimize parameters like pH, temperature, and chemical dosages to maximize removal efficiency.

  6. Waste-to-Energy Conversion: AI-driven waste-to-energy technologies can convert organic waste, including contaminated biomass, into biofuels or electricity. This not only reduces landfill waste but also generates renewable energy and revenue streams.

  7. Environmental Modeling and Simulation: AI-based environmental modeling and simulation tools can simulate the behavior of pollutants in the environment, helping scientists and policymakers understand the long-term impacts of contamination and devise effective remediation strategies.

By leveraging these AI techniques and technologies, it's possible to combat pollution and contaminants effectively, restore environmental quality, and create sustainable revenue streams from waste materials.

Toxic and Heavy Metals Contaminating Our Environment

Here's a list of some toxic and heavy metals besides arsenic that pose threats globally:

  • Landfills and Sites: Lead, mercury, cadmium, chromium, copper, nickel
  • Groundwater: Lead, mercury, cadmium, chromium (hexavalent), arsenic, fluoride
  • Air: Lead (from past use in gasoline), mercury (from coal burning), cadmium (from industrial processes)

AI Techniques for Pollution Cleanup (Some Under Development)

While AI is a powerful tool, completely eliminating pollutants from every medium in real-time is a complex challenge. Here's how AI can play a role:

  • AI-powered monitoring and analysis:
    • Use AI to analyze sensor data from air, water, and soil to identify pollutant types, track their spread, and predict potential risks.
    • This can help target cleanup efforts more effectively.
  • Optimizing existing technologies:
    • Develop AI systems to optimize existing remediation techniques like filtration, bioremediation (using microbes), and chemical treatment.
    • AI can help determine the most efficient treatment methods for specific types and concentrations of pollutants.
  • Advanced material development:
    • Train AI models to design new materials for capturing and filtering specific pollutants from air, water, and soil.
    • This is a developing field but holds promise for future solutions.

Challenges and Limitations:

  • Complexity of pollution: Different pollutants require specific treatment methods.
    • A one-size-fits-all AI solution might not be feasible.
  • Real-time limitations: Some AI-powered cleanup methods are still under development and might not be ready for real-time deployment.
  • Cost and Infrastructure: Implementing AI-powered solutions can be expensive and require advanced infrastructure.

Revenue Generation through Sustainable Practices:

  • Recovering valuable resources: Extracting recoverable metals like copper or nickel from contaminated sites can generate revenue while promoting resource recovery.
  • Carbon credits: Invest in clean energy solutions or reforestation projects to offset carbon emissions associated with cleanup processes and generate carbon credits.
  • Data insights: Sell anonymized data on pollution levels and cleanup efforts to research institutions or environmental agencies.

Scientific and Natural Methods for Cleanup:

  • Bioremediation: Utilizing microbes that can break down or absorb pollutants.
  • Phytoremediation: Planting specific plants that can absorb and accumulate pollutants from soil and water.
  • Chemical precipitation: Adding specific chemicals to cause pollutants to precipitate out of water for easier removal.
  • Filtration and membranes: Using specialized filters or membranes to remove pollutants from water and air.

Combining AI and Traditional Methods:

The most effective approach often combines AI for analysis and optimization with established scientific and natural methods for pollutant removal.

Important Considerations:

  • Focus on prevention: Pollution prevention through stricter regulations and cleaner industrial practices is crucial to reduce the need for future cleanup efforts.
  • Sustainability: Cleanup methods should prioritize environmental safety and minimize secondary pollution.

By employing a combination of AI, established scientific methods, and natural processes, we can create a more sustainable future and generate revenue through responsible resource recovery and environmental solutions.

===============================================================================================================================================================================================================================================================

Toxic and Heavy Metals Contaminating Our Environment

Here's a list of some toxic and heavy metals besides arsenic that are contaminating groundwater, landsites, and air around the globe:

  • Lead (Pb): From old paint, lead pipes, and battery manufacturing.
  • Mercury (Hg): From industrial processes, coal-fired power plants, and improper disposal of thermometers.
  • Cadmium (Cd): From battery production, metal plating, and phosphate fertilizers.
  • Chromium (Cr): From industrial processes, leather tanning, and wood preservation.
  • Nickel (Ni): From mining, stainless steel production, and electroplating.
  • Copper (Cu): From fungicides, algaecides, and industrial processes (at high concentrations).

AI Techniques for Pollution Cleanup (with Limitations)

While AI cannot directly remove pollutants, it can be a valuable tool in various aspects of environmental cleanup:

  • AI-powered monitoring and analysis:
    • Use AI to analyze data from sensors in air, water, and soil to identify pollutant types, track their spread, and predict future contamination risks.
  • Optimizing remediation strategies:
    • Develop AI models to recommend the most effective cleanup methods based on the type and level of contamination, soil conditions, and environmental factors.
  • Automating robots and drones:
    • Utilize AI to guide robots for tasks like targeted excavation of contaminated soil or deploying drones for aerial mapping and monitoring of cleanup progress.
  • Predictive maintenance of cleanup systems:
    • Implement AI for real-time monitoring of cleanup equipment, predicting maintenance needs, and preventing downtime.

Challenges and Limitations:

  • Developing robust AI models: Training AI models for complex environmental tasks requires a significant amount of data and ongoing refinement.
  • Site-specific solutions: The most effective cleanup method depends on the specific type and location of the contamination. AI can't replace human expertise entirely.
  • Cost and Infrastructure: Deploying AI-powered solutions might require substantial investment in infrastructure and technology.

Scientific and Natural Methods for Cleanup:

  • Phytoremediation: Using plants that absorb and accumulate heavy metals from the soil.
  • Bioremediation: Utilizing microorganisms to break down or detoxify contaminants.
  • Soil Washing/Chelation: Removing metals from soil using chemical solutions.
  • Membrane Filtration: Separating contaminants from water using specialized membranes.

Revenue Generation:

  • Treated Water and Soil Sales: Cleaned water and soil can be sold for various purposes, depending on the level of purification.
  • Carbon Credits: Remediation projects that capture carbon dioxide during cleanup can generate carbon credits.
  • Data and Expertise: Selling data insights and cleanup expertise to other organizations.

Important Considerations:

  • Combining AI with traditional methods can lead to more efficient and targeted cleanup strategies.
  • Scientific and natural methods often require less initial investment but might be slower than some technological solutions.
  • Sustainability and long-term effectiveness of cleanup methods need to be carefully evaluated.

Creating Revenue-making Substances (Limited Scope with Current Technology)

While AI and science are constantly evolving, directly creating revenue-making substances from pollutants with current technology is limited. Here are some possibilities to consider for future advancements:

  • Heavy metals might be recovered and reused in specific industrial applications after careful purification.
  • Extracted pollutants could potentially be used in research and development of new materials or technologies (needs extensive research).

Real-time Revenue Collection:

Implementing real-time revenue collection for environmental cleanup depends on the specific revenue model. Here are some possibilities:

  • For selling treated water or soil: Implement smart meters or invoicing systems based on volume or quality of the purified resource.
  • For carbon credits: Credits can be traded through established carbon markets.

Conclusion:

AI offers promising tools for environmental cleanup, but it's a collaborative effort with traditional methods and scientific expertise. Focusing on sustainable and ethical solutions is crucial for a cleaner future.

===============================================================================================================================================================================================================================================================

Toxic and Heavy Metals Contaminating the Environment

Here's a list of some toxic and heavy metals besides arsenic that are contaminating groundwater, landsites, and air around the globe:

  • Lead (Pb): From old paint, lead pipes, and battery manufacturing.
  • Mercury (Hg): From industrial processes, coal-fired power plants, and improper disposal of mercury-containing products.
  • Cadmium (Cd): From batteries, pigments, and metal plating.
  • Chromium (Cr): From industrial processes, leather tanning, and wood preservation.
  • Nickel (Ni): From mining, electroplating, and stainless steel production.
  • Copper (Cu): From fungicides, algaecides, and industrial processes (at high concentrations).

AI Techniques for Pollution Cleanup (with Limitations)

While AI is a powerful tool, completely eliminating pollutants using AI alone is not currently possible. Here's how AI can assist in the cleanup process:

  • AI-powered Monitoring and Analysis:

    • Utilize AI to analyze data from sensors monitoring air, water, and soil quality to identify areas with high pollutant concentrations.
    • Develop AI models to predict pollutant migration patterns and assess potential risks.
  • Optimizing Remediation Strategies:

    • Implement AI to analyze various remediation techniques (e.g., bioremediation, phytoremediation, chemical treatment) and recommend the most effective approach based on the specific pollutant and environmental conditions.
    • This can save time and resources during cleanup efforts.
  • Automating Cleanup Processes:

    • In some cases, AI-controlled robots can be used for tasks like:
      • Excavating contaminated soil.
      • Deploying bioremediation agents (microorganisms that break down pollutants).
      • Sorting and separating contaminated materials for further treatment.

Scientific and Natural Methods for Cleanup:

  • Bioremediation: Utilizing microorganisms (bacteria, fungi) to break down pollutants into harmless substances.
  • Phytoremediation: Using plants that absorb and accumulate heavy metals from the soil.
  • Chemical Treatment: Applying specific chemicals to immobilize or remove pollutants.
  • Soil Washing/Flushing: Extracting contaminants from soil using water or chemical solutions.

Challenges and Considerations:

  • Complexity of Pollution: Different pollutants require specific treatment methods. AI can help determine the most effective approach.
  • Cost and Infrastructure: Implementing AI-powered solutions can be expensive, and infrastructure upgrades might be necessary.
  • Ethical Considerations: Ensure scientific and safe methods are used for both AI-assisted and natural cleanup techniques.

Revenue Generation from Cleanup:

  • Treated Resources: Sell recovered water, decontaminated soil, or extracted metals (if feasible) for industrial use.
  • Carbon Credits: Remediation projects that sequester carbon (e.g., planting trees for phytoremediation) can potentially generate carbon credits.
  • Environmental Consulting: Offer AI-powered environmental monitoring and analysis services to companies and government agencies.

Important Considerations:

  • AI is a developing field, and its real-time application in pollution cleanup is still evolving.
  • Revenue generation from cleanup efforts might be limited depending on the type and location of contamination.

By combining AI with established scientific and natural methods, we can develop more efficient and targeted approaches to environmental cleanup. This can lead to a cleaner planet and potentially create new revenue streams for a sustainable future.

Getting things moving

 The most important moments in invention are sometimes the imaginative leaps – even when they turn out to be dead-ends

Behind the drawn curtains of his home in Palo Alto, California, the railroad magnate Leland Stanford waited for his horse to be brought to life. A white sheet hung against one wall, and in the gloaming the only light came from a wood-and-brass construction at the back of the room. Suddenly, with a mechanical clatter and the hiss of an oxyacetylene lamp, a moving image appeared on the screen. It was little more than a silhouette, but Stanford and his astonished guests could clearly see Hawthorn, Stanford’s stallion, walking along as if it were right there in the room among them.

Eadweard Muybridge, proud and nervous, stood next to his device. The British photographer had a reputation in California thanks to his superb technical eye and his majestic Yosemite waterscapes, as well as the sensational murder of his wife’s lover five years before. But he had escaped conviction, and resumed work on a commission from Stanford to capture the motion and beauty of his benefactor’s beloved race horses.

As the applause from the audience died away, Stanford addressed the photographer. ‘I think you must be mistaken in the name of the animal,’ he said. ‘That is certainly not the gait of Hawthorn but of Anderson.’ It turned out that the stable staff at Stanford’s ranch had switched the horses. But so crisp was the outline, and so defined its movements, that Stanford could tell the difference.

This private demonstration for Stanford took place in the autumn of 1879, shortly after he bought the estate that would go on to become Stanford University. A hundred years later, Stanford and its surrounds would become renowned as the crucible of the ‘Silicon Valley’ computing boom, building on the ‘analytical engine’ first envisaged by another British inventor, the mathematician and polymath Charles Babbage.

Babbage and Muybridge were separated by class, generation and temperament. But for both creators, the path from conception to application for their technologies evolved in ways that they couldn’t have anticipated. Putting their lives side by side contains valuable insights about the contingency of history, and what it takes to be remembered as the ‘father’ or ‘mother’ of invention.

Muybridge was born in 1830 as Edward Muggeridge, into a merchant family that traded in corn and coal in Kingston upon Thames in England. The place inspired the first of Muybridge’s many name changes when, at 20, he appropriated the ‘Eadweard’ spelling of the Anglo-Saxon kings that had been carved upon an ancient coronation stone near his home. He set off to New York as a young man, in 1850, before crossing the country to the frontier town of San Francisco. Over time, for reasons he never explained, his surname evolved to Muygridge and then Muybridge.

Muybridge set up as a professional photographer and, in 1872, he married Flora Shallcross Stone, a young divorcee half his age. He also found himself drawn into Stanford’s circle, after the millionaire asked Muybridge to take photographs of his galloping racehorses to determine whether they had all four hooves off the ground at any point in their stride.

With Muybridge away from home much of the time, Flora fell pregnant to a rambunctious drama critic called Harry Larkyns. Seven months after the baby was born, Muybridge discovered he was not the father. Incensed, he tracked down Larkyns at a ranch in the Napa Valley. Muybridge called out from his hiding place in the dark. As his wife’s lover peered into the gloom, Muybridge said: ‘My name is Muybridge and I have a message from my wife,’ shooting Larkyns point-blank through the heart. Within hours, Muybridge had been arrested.

Charles Babbage led a much more refined life than the knockabout Muybridge. Born in London in 1791, son of a goldsmith and banker, Babbage inherited a fortune from his father and could have spent his life as a dilettante. He thrived in London’s high society, and much of his work seems to have been undertaken in an attempt to impress the rich and famous at his popular soirées. But Babbage was also intelligent and well-educated, holding down the post as Lucasian Professor of Mathematics at Cambridge for 11 years from the age of 37.

Early on in his tenure, inspired by the industrial revolution, he began toying with the idea of a mechanical calculator that would use gears to overcome the labour of working out mathematical equations by hand. In the summer of 1821, Babbage was helping his astronomer friend John Herschel check a series of astronomical tables. Going cross-eyed with the effort of working the array of figures, Babbage is said to have cried out: ‘My God, Herschel! How I wish these calculations could be executed by steam!’

Of itself, the idea of a mechanical calculator was not new. Such devices go back at least as far as the Antikythera mechanism, recovered from a Greek shipwreck dating to the first or second century BC, which used a complex mechanism of gears to predict the motion of heavenly bodies and other natural phenomena. And there is a more direct antecedent of Babbage’s work in the calculating machine devised by the French mathematician Blaise Pascal in the 1640s, a number of which were constructed.

Babbage’s first concept was called a ‘Difference Engine’. Like Pascal’s machines, it involved a series of gears, but was more sophisticated in the range and scale of its calculations. He convinced the British government to invest £17,000 in his project – around £1.2 million in today’s money – but he completed only a fraction of the total machine. Despite the government’s objections, he dropped the Difference Engine for a far grander idea – what he called his ‘Analytical Engine’.

Muybridge’s breakthrough came with the zoopraxiscope, the world’s first movie projector

Muybridge’s murder trial in 1875 drew a huge crowd. His defence team attempted to show that he was deranged, arguing that a wagon crash in 1860 had damaged his judgment and self-control. But the prosecution tore his insanity plea apart. In a final, impassioned speech, Muybridge’s defence lawyer told the jury that Muybridge’s actions were justified, citing the Bible to argue that killing his wife’s adulterous lover was the right thing to do. After a night’s deliberation, the jury found Muybridge not guilty.

Some time after, Muybridge reconnected with Stanford. This time, he set up a bank of 12 top-quality stereoscopic cameras with high-speed shutters, and took a rapid series of photographs of Stanford’s horse in motion. The breakthrough came with the invention of the zoopraxiscope, the device that had enabled Stanford to recognise his horse. Images were arrayed around the outside of a disc, which rotated rapidly in one direction, while a counter-rotating disc with slots acted as a gate to control which image was projected onto a screen, creating the illusion of movement. It was the world’s first movie projector.

After a sell-out European tour and a legal battle with Stanford, who claimed the images as his own, Muybridge found another opportunity to raise his profile. He met William Pepper, the provost of the University of Pennsylvania, who enabled Muybridge to produce thousands of motion studies. Between 1884 and 1887, using far better photographic materials, Muybridge shot hundreds of sequences of men and women, often naked, performing all sorts of tasks and movements. (It was often very difficult to persuade bricklayers to do their job with no clothes on, Muybridge commented ruefully.) The apex of his contribution to moving pictures came at the World’s Columbian Exposition of 1893, a huge fair in Chicago to mark the 400th anniversary of Christopher Columbus landing in the New World. Here, Muybridge built the Zoopraxographical Hall – the first purpose-built cinema, a 50-foot-high extravaganza in mock stone.

In contrast to Muybridge’s raw and dusty work on Stanford’s property, Babbage was inspired by the sophistication of silk weaving. Making complex patterns with fine silk thread was painfully slow when done by hand – so much so that two loom operators might produce only an inch of material a day. In the 1740s, a French factory inspector devised a loom that used a mechanism such as a musical box to speed up the process. Just as the pins on the rotating cylinder of a musical box triggered notes on metal prongs, the device used pins to control different-coloured threads. However, each cylinder was expensive to produce, and the design was limited by the size of the cylinder – one turn, and the pattern began to repeat.

A new loom created by Joseph-Marie Jacquard, the son of a master weaver, swapped the cylinder for a series of holes punched on cards. Each hole indicated whether or not a particular colour should be used at that point, and because the train of punched cards could be as long as the pattern required, almost any piece of weaving could be automated this way. Before long, Jacquard looms were turning out two feet of silk a day – a remarkable transformation of productivity.

The versatility of Jaquard’s system appealed to Babbage. A treasure he often exhibited to visitors was a portrait of Jacquard that appeared to be an etching – but on close examination it was woven from silk, with a remarkable 24,000 rows of thread making up the image. Such a product would have been impossible without Jacquard’s technology, and Babbage realised that a similar approach could be used in a truly revolutionary computing device, his Analytical Engine.

Dismissing the fixed gears of his earlier design, Babbage wanted the Analytical Engine to have the same flexibility as the Jacquard loom. For the Difference Engine, the data to be worked on was to be entered manually on dials, with the calculation performed according to the configuration of gears. In the Analytical Engine, both data and calculation would be described by a series of Jacquard-style punched cards, which allowed for far more flexibility of computation.

The Difference Engine was an incomplete mechanical calculator, while the Analytical Engine never got off the drawing board

There was just one problem. Although Babbage designed the Analytical Engine in concept, he never managed to construct even a part of it. Indeed, it’s unlikely that his design could ever have been successfully built. His plans fired the enthusiasm of Ada Lovelace, the mathematician and daughter of the poet Lord Byron. She was eager to work with Babbage on his Analytical Engine, and described several potential programs for his hypothetical machine. But Babbage showed little interest in Lovelace’s contributions. His grand vision proved impossible to make a reality.

The technologies envisaged by both Babbage and Muybridge bear little connection to their modern equivalents. Their devices were evolutionary dead ends. Muybridge’s banks of cameras were clumsy and impractical; his movies were limited to a couple of seconds in duration. And Babbage’s computers were even worse. The Difference Engine was an incomplete mechanical calculator, while the Analytical Engine never got off the drawing board.

For both computers and moving pictures, the real, usable technology would require a totally different approach. But how the legacies of each of these inventors has been preserved varied greatly according to the vagaries of chance, politics and ambition.

The conceptual originators of the modern computer were the British mathematicians Alan Turing, who devised the fundamental model, and John von Neumann, who turned Turing’s highly stylised theory into a practical architecture. These pioneers had an academic background, and their consideration was not glory, but solving an intellectual challenge to help the military effort during the Second World War. As prime minister, Winston Churchill was determined to keep the power of the computing equipment at Britain’s disposal a secret, and the work was accordingly downplayed. As a result, Babbage was never totally eclipsed by his successors.

But neither academic restraint nor political interference got in the way of Muybridge’s rivals. Inventors such as the Lumières, two French brothers who developed a self-contained camera and movie projector, had everything to gain financially from being recognised as firsts. They picked up on the invention of the roll-film for still cameras to create moving pictures that were much easier to make, and lasted much longer. These entrepreneurs had no need for a conceptual ancestor – Muybridge was not a muse, but potential competition.

Muybridge’s reputation also suffered after the publication of A Million and One Nights (1926), a book about the early years of moving pictures by Terry Ramsaye, the editor of an American cinema trade magazine. Ramsaye cast Muybridge as a self-serving fraud who passed off other people’s inventions as his own. He was supported by the evidence of John D Isaacs, an electrical engineer who helped to build the shutter-release mechanism used in Muybridge’s action photography, and claimed to be the genius behind all Muybridge’s work. Muybridge, who had died more than 20 years earlier, couldn’t speak for himself. Ramsaye’s account was later discredited, but it was enough to wipe Muybridge off the map for many years.

Science and technology are rarely about lone genius. Neither Muybridge nor Babbage developed workable inventions that functioned at scale, and it ultimately took new creators, with fresh approaches, to bring their ideas to life. But that initial spark matters – and, as their lives remind us, being an inventor is as much about imagination as it is about creation.


Wednesday, 26 April 2023

How can a first-person shooter have a victim complex?

 A lot of terrible things happen to video-game characters. In the early days of the form, Italian plumbers were squashed by barrels, loveable hedgehogs impaled on spikes, and heroic astronauts exploded. But no game delights in inflicting as much horror on its protagonists as the $10-billion Call of Duty (CoD) franchise.

Originally set in the Second World War, the series took on its current ultra-popular form with Call of Duty 4: Modern Warfare (2007). Every year brings new CoD games, and the franchise regularly tops charts, racking up more than 175 million sales over the course of the past decade. Each game weaves together a semi-coherent narrative that combines the regular annihilation of its characters at a George R R Martin pace with a wider and more worrying story arc: one of US powerlessness, decline and revenge.

The Call of Duty games are first-person shooters (FPS). ‘First-person’ means that the players see the world through the eyes of their characters – only their hands and weapons are clearly visible. (Some games break this form in third-person cutscenes, but CoD sticks to it resolutely). ‘Shooter’ means that the character interacts with the world almost entirely through the barrel of a gun; the environment, and its endless stream of enemies, exists to be destroyed.

Many FPS are power fantasies, with the child-like joy of being able to murder everything you see. These are little boys’ tales of war, where a finger-gun blows away a thousand foes, and the player is always the actor, never the victim.

Call of Duty frequently reverses that dynamic. In almost every other FPS, the player’s death is a constant possibility, yet those are transient endings, erased from the game’s reality by a simple reload. But in Call of Duty, sufferings both permanent and unavoidable are ineffably worked into the game’s narrative. Although the gameplay is a consistent frenzy of violence inflicted by the player on the world, narrative control – the happy freedom to move, shoot, call in drone strikes, knife people in the back – is regularly snatched back from the players, forcing them into positions of constant helplessness.

Meanwhile, the point of view continuously switches, deliberately and disorientatingly – a CIA agent, a doomed ISS astronaut, a fallen dictator, a SAS operative. Throughout the series, the player’s avatars are repeatedly tortured, nuked, brainwashed, murdered, mutilated and, above all, betrayed – often by their own leadership. If they live, they are depicted as broken, made whole only by vengeance. If they die, another character takes up the mantle of revenge.

Beyond such personal treacheries and dismemberments is a wider depiction of victimhood, not individual but national. The reversal, from active shooter to passive martyr, gets at a lasting psychological truth, which is that however absolute US military superiority is in reality, many Americans feel themselves to be a nation under siege by an ungrateful world, eternally vulnerable.

The original CoD 4: Modern Warfare (2007) begins by depicting US power realistically (enemy tanks are neutralised by man-portable FGM-148 Javelin missiles, enemy soldiers are mowed down en masse by AC-130 Spectre gunships, and the war is far from US shores). Then a nuclear blast kills the first protagonist, shifting the conditions of the game. Alongside him die 30,000 other US troops, their names and ranks scrolling past rapidly on screen, in a plot that turns out to have been orchestrated by a line-up of villains old and new – Russian ultra-nationalists working alongside Islamic terrorists.

The sequels go further. Through a technological magical wand that cripples US defence systems, an army of Russian paratroopers and marines is able to invade Virginia. In the course of MW2 (2009) and MW3 (2011), battles rage along Wall Street, the Brooklyn Bridge and Pennsylvania Avenue, with ‘tens of thousands’ of Americans killed and the day saved only by heroic violence from the US army.

In this timeline, US power is expressly fallen. As the ad copy for Call of Duty: Ghosts (2013) puts it, ‘the balance of global power’ changed forever and a ‘crippled nation’ faces ‘technologically superior’ foes. War, against an array of bogeymen, from insurgent South American powers to the ever-green Russians, has become a constant necessity.

Every indignity inflicted on individual Americans (and the occasional Brit) in the narrative is echoed in the games’ geopolitics: US cities burn in nuclear fire, Mexican border states become a ‘No Man’s Land’, and the US financial and military elite repeatedly betray the nation, and its soldiers.

This might all seem very silly, and it is. I don’t believe that the writers of the game have much more in mind than spectacle when plotting the course of the series, because it certainly does look pretty when great landmarks explode. Token efforts at moral complexity are woven in here and there, from the anti-war quotes that play over every temporary death to the origins of a foe’s plot against the US in the death of his relatives from a US strike.

And yet, Modern Warfare reinforces, consciously or otherwise, the ever-present US myth that the country is an innocent victim in a cruel world. It’s a belief that swelled massively after the attacks of 11 September 2001, but it has always been present. The start of US wars has always been framed by betrayal, real or otherwise, from the actual surprise attacks of Pearl Harbor to the imaginary Spanish super-weapons blamed for the explosion of the USS Maine in 1898 or US President Lyndon B Johnson’s lies around the Gulf of Tonkin. And in Modern Warfare, the killing of US soldiers, especially by illegitimate foes – militias, terrorists, ‘rebel forces’ that make up the mass of Call of Duty’s shooting targets – is taken not just as a consequence of war, but as a crime committed by the enemy.

Coupled with that is the fear of weakness. In a country that outspends all its potential foes put together, nearly half the public believes, according to a recent Gallup poll, that it is ‘just one of several leading military powers’. (Back in the Cold War, the entirely fictitious ‘missile gap’ served the same propagandist purpose, but then, at least, there was the excuse of Soviet opaqueness around what were, in fact, considerably smaller and more backward arsenals than the US possessed.)

Victimhood is only the start. The protagonists of CoD games aren’t stopped by betrayal, injury or even death. Their travails are the necessary prelude to their roaring rampage of vengeance, and their passive suffering doesn’t subvert the fantasy of power but endorses it; every bullet fired is justified by ruined bodies, both politically and personally. CoD’s wars are a hysterically exaggerated reinforcement of national wish-fulfilment, consumed by millions of young US men: they hurt us first, so we get to hurt them back.

An Analysis Of Convolutional Neural Networks For Image Classification

 presents an empirical analysis of theperformance of popular convolutional neural networks (CNNs) for

identifying objects in real time video feeds. The most popular convolution neural networks for object detection and object
category classification from images are Alex Nets, GoogLeNet,and ResNet50. A variety of image data sets are available to
test the performance of different types of CNN's. The commonly found benchmark datasets for evaluating the performance
of a convolutional neural network are anImageNet dataset, and CIFAR10, CIFAR100,and MNIST image data sets. This
study focuses on analyzing the performance of three popular networks: Alex Net, GoogLeNet,and ResNet50. We have taken
three most popular data sets ImageNet, CIFAR10,and CIFAR100 for our study, since, testing the performance of a network
on a single data set does not reveal its true capability and limitations. It must be noted that videos are not used as a training
dataset, they are used as testing datasets. Our analysis shows that GoogLeNet and ResNet50 are able to recognize objects
with better precision compared to Alex Net. Moreover, theperformance of trained CNN's vary substantially across different
categories of objects and we,therefore, will discuss the possible reasons for this.
© 2018 The Authors. Published by Elsevier B.V.
Peer-review under theresponsibility of the scientific committee of the International Conference on Computational
Intelligence and Data Science (ICCIDS 2018).
Keywords: Deep Learning; CNN; Object detection; Object classification; Neural network
1.Introduction
Nowadays internet is filled with anabundance of images and videos, which is encouraging thedevelopment of search
applications and algorithms that can examine the semantic analysis [1] of image and videos for presenting the user with
better search content and their summarization. There have been major breakthroughs in image labeling, object detection,
scene classification [2] [3], areas reported by different researchers across the world. This leads to making it possible to
formulate approaches concerning object detection and scene classification problems. Since artificial neural networks have
shown a performance breakthrough in the area of object detection and scene classification, specially convolutional neural
networks (CNN)[4] [5] [6], this work focuses on identifying the best network for this purpose. Feature extraction is a key
step of such algorithms. Feature extraction from images involves extracting a minimal set of features containing ahigh
amount of object or scene information from low-level image pixel values, therefore, capturing the difference among the
object categories involved. Some of the traditional feature extraction techniques used on images are Scale-invariant feature
transform (SIFT) [7], histogram of oriented gradients (HOG) [8], Local binary patterns (LBP) [10], Content-Based Image
Retrieval (CBIR) [11], etc. Once features are extracted their classification is done based on objects present in an image. A
few examples of classifiers are Support vector machine (SVM), Logistic Regression, Random Forest, decision trees etc.

Corresponding author: neha.sharma5852@gm
1877-0509 © 2018 The Authors. Published b
Peer-review under theresponsibility o
Intelligence and Data Science (ICCIDS
Availa
Procedia
International Conference
2018)
An Analysis Of Co
Neha
Abstract
This paper presents an empirical ana
identifying objects in real time video fe
category classification from images are
test the performance of different types o
of a convolutional neural network are
study focuses on analyzing the performa
three most popular data sets ImageNet,
on a single data set does not reveal its t
dataset, they are used as testing dataset
with better precision compared to Alex
categories of objects and we,therefore, w
© 2018 The Authors. Published by Elsev
Peer-review under theresponsibility o
Intelligence and Data Science (ICCIDS
Keywords: Deep Learning; CNN; Object dete
1.Introduction
Nowadays internet is filled with anab
applications and algorithms that can ex
better search content and their summar
scene classification [2] [3], areas repor
formulate approaches concerning objec
shown a performance breakthrough in t
networks (CNN)[4] [5] [6], this work f
step of such algorithms. Feature extrac
amount of object or scene information
object categories involved. Some of the
transform (SIFT) [7], histogram of orie
Retrieval (CBIR) [11], etc. Once featur
few examples of classifiers are Suppor

Corresponding author: neha.sharma5852@gmail.com
1877-0509 © 2018 The Authors. Published by Elsevier B.V.
Peer-review under theresponsibility of the scientific committee of the International Conference on Computational
Intelligence and Data Science (ICCIDS 2018).
Available online at www.sciencedirect.com
ScienceDirect
Procedia Computer Science 00 (2018) 000000
www.elsevier.com/locate/proc
edia
International Conference on Computational Intelligence and Data Science (ICCIDS
2018)
An Analysis Of Convolutional Neural Networks For Image
Classification
Neha Sharma,Vibhor Jain, Anju Mishra
Amity University Uttar Pradesh, Noida, India
Abstract
This paper presents an empirical analysis of theperformance of popular convolutional neural networks (CNNs) for
identifying objects in real time video feeds. The most popular convolution neural networks for object detection and object
category classification from images are Alex Nets, GoogLeNet,and ResNet50. A variety of image data sets are available to
test the performance of different types of CNN's. The commonly found benchmark datasets for evaluating the performance
of a convolutional neural network are anImageNet dataset, and CIFAR10, CIFAR100,and MNIST image data sets. This
study focuses on analyzing the performance of three popular networks: Alex Net, GoogLeNet,and ResNet50. We have taken
three most popular data sets ImageNet, CIFAR10,and CIFAR100 for our study, since, testing the performance of a network
on a single data set does not reveal its true capability and limitations. It must be noted that videos are not used as a training
dataset, they are used as testing datasets. Our analysis shows that GoogLeNet and ResNet50 are able to recognize objects
with better precision compared to Alex Net. Moreover, theperformance of trained CNN's vary substantially across different
categories of objects and we,therefore, will discuss the possible reasons for this.
© 2018 The Authors. Published by Elsevier B.V.
Peer-review under theresponsibility of the scientific committee of the International Conference on Computational
Intelligence and Data Science (ICCIDS 2018).
Keywords: Deep Learning; CNN; Object detection; Object classification; Neural network
1.Introduction
Nowadays internet is filled with anabundance of images and videos, which is encouraging thedevelopment of search
applications and algorithms that can examine the semantic analysis [1] of image and videos for presenting the user with
better search content and their summarization. There have been major breakthroughs in image labeling, object detection,
scene classification [2] [3], areas reported by different researchers across the world. This leads to making it possible to
formulate approaches concerning object detection and scene classification problems. Since artificial neural networks have
shown a performance breakthrough in the area of object detection and scene classification, specially convolutional neural
networks (CNN)[4] [5] [6], this work focuses on identifying the best network for this purpose. Feature extraction is a key
step of such algorithms. Feature extraction from images involves extracting a minimal set of features containing ahigh
amount of object or scene information from low-level image pixel values, therefore, capturing the difference among the
object categories involved. Some of the traditional feature extraction techniques used on images are Scale-invariant feature
transform (SIFT) [7], histogram of oriented gradients (HOG) [8], Local binary patterns (LBP) [10], Content-Based Image
Retrieval (CBIR) [11], etc. Once features are extracted their classification is done based on objects present in an image. A
few examples of classifiers are Support vector machine (SVM), Logistic Regression, Random Forest, decision trees etc.
378 Neha Sharma et al. / Procedia Computer Science 132 (2018) 377384
CNN has been presenting anoperative class of models for better understanding of contents present in an image, therefore
resulting in better image recognition, segmentation, detection,and retrieval. CNN's are efficiently and effectively used in
many pattern and image recognition applications, for example, gesture recognition [14], face recognition [12], object
classification [13] and generating scene descriptions. Similarly, CNNs have achieved detection rates (CDRs) of 99.77%
using the MNIST database of handwritten digits [23], 97.47% with the NORB dataset of 3D objects [24], and 97.6% on
around 5600 images of more than 10 objects [25]. The successful integration of all the stated applications is due to
advances and development in learning algorithms for deep network construction and moderately to the open source large
labeled data set available for experimentation purpose, for example, ImageNet, CIFAR 10, 100, MNIST etc. [16] CNN has
well known trained networks that uses these datasets available in open source networks and increases its efficacy of
classification after getting trained over millions of images contained in the datasets of CIFAR-100 and Image-Nets. The
datasets used are composed of millions of tiny images. Therefore, they can simplify well and accurate and hence
successfully categorize the classes’ out-of-sample examples. It is important to note that neural network classification and
prediction accuracy and error rates are all most comparable to that of humans when such comparisons are made on a large
data set such as Image-Net, CIFAR-10, 100 etc. This work aims at analyzing the capability of convolutional neural networks
to categories the scene in videos on the basis of identified objects. A variety of image categories are included in CIFAR-100,
CIFAR 10 and ImageNet datasets for training the CNN. The test datasets are videos of different categories and subjects. The
contradiction branches out because of the feature extraction capabilities of different CNN. The primary contribution of our
work is to present object detection methods using different types of trained neural networks where current up-to-date models
show different performance rates for test images or videos when compared to trained images. After training these networks
for different object classes presented as input in the form of images, and then testing for themore particular real-time video
feed, we can better understand what is being learned and presented by these models. We therefore, can postulate that an
image representation on the basis of objects detected in it would be significantly useful for high-level visual recognition
tasks for scenes jumbled with numerous objects resulting in difficulty for the network to classify it. These networks also
provide supplementary information about the extraction of low-level features. These networks are trained on datasets
containing millions of tiny images [12]. We propose that the concept of object detection can be used as an attribute for scene
representation. These networks used for our study are constructed using existing neural networks and each of these networks
have different layers, therefore their performance varies considerably. Using complex real-world scenes the detection
accuracy of the network can be checked. This paper is arranged as follows. We begin by presenting related prior works,
following with the problem statement and our proposed methodology for comparing the networks chosen for the study,
including descriptions of the models and data sets. We then present a comprehensive analysis of results obtained on different
datasets. Finally, we conclude the paper and discuss about future work.
2.Related Work
The Convolutional Neural Networks (CNN) are used in a number of tasks which have agreat performance in different
applications. Recognition of handwritten digits [17] was one of the first application where CNN architecture was
successfully implemented. Since the creation of CNN, there has been continuous improvement in networks with the
innovation of new layers and involvement of different computer vision techniques [18]. Convolutional Neural Networks are
mostly used in the ImageNet Challenge with various combinations of datasets of sketches [19]. Few of the researchers have
shown a comparison between thehuman subject and a trained network’s detection abilities on image datasets. The
comparison results showed that human being corresponds to a 73.1% accuracy rate on the dataset whereas the outcomes of a
trained network show a 64% accuracy rate [21]. Similarly, when Convolutional Neural Networks was applied to the same
dataset it yielded an accuracy of 74.9%, hence outperforming the accuracy rate of humans [21]. The used methods mostly
make use of the strokes’ order to attain a much better accuracy rate. There are studies going on that aim at understanding
Deep Neural Network’s behavior in diverse situations [20]. These studies present how small changes made to an image can
severely change the results of grouping. In the work also, presents images that are fully unrecognized by human’s beings but
are classified with high accuracy rates by the trained networks [20].
There has been a lot of development in the area of feature detectors and descriptors and many Algorithms and techniques
have been developed for object and scene classification. We generally enticement the similarity between the object
detectors, texture filters,and filter banks. There is anabundance of work in the literature of object detection and scene
classification [3]. Researchers mostly use the current up-to-date descriptors of Felzenszwalb and context classifiers of
Hoeim [4]. The idea of developing various object detectors for basic interpretation of images is similar to the work done in
multi-media community in which they use alarge number of “semantic concepts” for image and video annotations and
semantic indexing [22]. In the literature that relates to our work, each semantic concept is trained by using either the image
or frames of videos. Therefore the approach is difficult to use and understand the image with many cluttered objects in the
scene. The previous methods focused on single object detection and classification based on feature set defined by humans.
These proposed methods explore the connection of objects in scene classification [3]. Many scene classification technique
was performed on the object bank to compute its utility. Many types of research have been conducted emphasizing their
focus on low-level feature extraction for object recognition and classification, namely Histogram of oriented gradient
(HOG), GIST, filter bank, and abag of feature (BoF) implemented though word vocabulary [4].
3.Methodology of Evaluation
The main aim of our work is to understand the performance of the networks for static as well as live video feeds. The first
step for the following is to perform transfer learning on the networks with image datasets. This is followed by checking the
Neha Sharma et al. / Procedia Computer Science 132 (2018) 377384 379
prediction rate of the same object on static images and real-time video feeds. The different accuracy rates are observed and
noted and presented in the tables given in further sections. Third important criteria for evaluating the performance was to
check whether prediction accuracy varies across all CNNs chosen for the study. It must be noted that videos are not used as
a training dataset, they are used as testing datasets. Hence we are looking for best image classifier where the object is the
main attribute for classification of scene category. Different layers of the convolutional neural network used are:
Input Layer: The first layer of each CNN used is ‘input layer’ which takes images, resize them for passing onto
further layers for feature extraction.
Convolution Layer: The next few layers are ‘Convolution layers’ which act as filters for images, hence finding
out features from images and also used for calculating the match feature points during testing.
Pooling Layer: The extracted feature sets are then passed to ‘pooling layer’. This layer takes large images and
shrink them down while preserving the most important information in them. It keeps the maximum value from
each window, it preserves the best fits of each feature within the window.
Rectified Linear Unit Layer:The next ‘Rectified Linear Unit’ or ReLU layer swaps every negative number of the
pooling layer with 0. This helps the CNN stay mathematically stable by keeping learned values from getting stuck
near 0 or blowing up toward infinity.
Fully Connected Layer: The final layer is the fully connected layers which takes the high-level filtered images
and translate them into categories with labels.
Fig. 1 Internal Layers of CNNs
The steps of proposed method are as follows:
1. Creating training and testing dataset: The super classes images used for training is resized [224,244] pixels for
AlexNet and [227,227] pixels GoogLeNet and ResNet50, and the dataset is divided into two categories i.e.
training and validation data sets.
2. Modifying CNNs network:Replace the last three layers of the network with fully connected layer, a softmax
layer, and a classification output layer. Set the final fully connected layer to have the same size as the number of
classes in the training data set. Increase the learning rate factors of the fully connected layer to train network
faster.
3. Train the network: Set the training options, including learning rate, mini-batch size, and validation data
according to GPU specification of the system. Train the network using the training data.
4. Test the accuracy of the network:Classify the validation images using the fine-tuned network, and calculate the
classification accuracy. Similarly testing the fine tune network on real time video feeds for accurate results.
4.Models
There are various smart pre-trained CNN, these CNN have the capability of transfer learning. Therefore it just requires the
training and testing datasets at its input layer. The architecture of the networks differs in terms of internal layers and
techniques used. GoogLeNet has Inception Modules that perform different sizes of convolutions and concatenate the filters
for the next layer [20]. On the other hand, AlexNet does not use filter concatenation, instead, it uses the output of the
previous layer as the input. Both networks have been tested independently and use the implementation provided by Caffe, a
Deep Learning framework [22]. ResNet is a short name for Residual Network. Many other visual recognition tasks have
also greatly benefited from very deep models. So, over the years there is a trend to go deeper, to solve more complex tasks
and to also increase /improve the classification/recognition accuracy. But, as we go deeper; the training of neural network
becomes difficult and also the accuracy starts saturating and then degrades also [3]. Residual Learning tries to solve both
these problems. In general, in a deep convolutional neural network, several layers are stacked and are trained to the task at
hand. The network learns several low/mid/ high-level features at the end of its layers [15][2]. In residual learning, instead of
trying to learn some features, the network tries to learn some residual. Residual can be simply understood as subtraction of
feature learned from theinput of that layer. ResNet does this using shortcut connection (directly connecting theinput of nth
layer to some (n+x) the layer [15]. It has proved that training this form of networks is easier than training simple deep
convolutional neural networks and also the problem of degrading accuracy is resolved. The comparison is made among
three existing neural networks i.e. the AlexNets, Google Nets and ResNet50 [21]. Followed by the transfer learning concepts
for training these networks and generating new networks for further comparison. The new models have asame number of
layers as that of original but the performance of these networks and existing networks varies considerably. On same images,
the different accuracy rates were formulated in the tables presented in the following section.
5.Test Datasets
Image dataset of CIFAR- 100 which has numerous super-classes of general object images and a number of subclass
categories of each superclass. CIFAR-100 has 100 classes of images with each class having 600 images each [15]. These
380 Neha Sharma et al. / Procedia Computer Science 132 (2018) 377384
600 images are divided into 500 training images and 100 testing images for each class, therefore, making a total of 60,000
different images. These 100 classes are clubbed together into 20 superclasses. Every image in the dataset comes with a
“fine” label (depicting the class to which it belongs) and a “coarse” label (superclass to the “fine” label detected). The
selected categories for training and testing are abed, bicycle, bus, chair, couch, motorcycle, streetcar, table, train, and
wardrobe [21][15]. For the proposed work, some wide categories of each super classes need to be used for training the
networks, the superclasses used are Household furniture and vehicle. The chosen categories are shown in thetable below.
The second dataset used was ImageNet datasets that has super-classes of images which is further divided into subclasses.
ImageNet is an image dataset which is organized as per the WordNet hierarchy. The dataset is organized as meaningful
concepts.
Each concept in WordNet is described by many words called a "synonym set" or "sync set". The dataset contains more than
100,000 sync sets. All images are human-annotated. Furthermore, a grouping of ImageNet’s less descriptive labels into more
meaningful sets that matched that of the superclass was done for our study. For example, “table” was relabelled as
“furniture”, similarly many other images were grouped into their superclasses and created a more descriptive and
meaningful label. The third dataset chosen for the study was aCIFAR-10 dataset of images. The CIFAR-10 dataset has
32x32 color images divided into 10 classes and 6000 images per class, which makes a total of 60000 images. The dataset
consists of 50000 training images and 10000 test images. The dataset is divided into five training batches and one test batch,
each of which has 10000 images. The test images are randomlyselected from each class.
Table 1. Performance of CNN's on CIFAR100 test dataset Table 2. Performance of CNN's on the CIFAR10 test dataset
CIFAR-100 AlexNet GoogLeNet ResNet50
Image
Category
Bed
Bicycle
Bus
Chair
Couch
0.00%
21.0
84.00%
90.00%
11.00%
70.80%
74.2%
63.20%
89.60%
14.60%
49.60%
55.00%
36.80%
57.60%
76.40%
Motorcycle 95.00% 74.60% 99.20%
Streetcar 21.00% 0.84% 63.80%
Table 00.00% 73.60% 33.40%
Train 30.00% 95.60% 34.20%
Wardrobe 89.00% 89.40% 92.20%
CIFAR-10 AlexNet GoogLeNet ResNet50
Image
Category
Airplane
Automobile
Bird
Cat
Deer
41.80%
21.80%
00.02%
00.03%
87.60%
51.10%
62.10%
56.70%
78.80%
49.50%
90.80%
69.10%
72.60%
61.90%
75.40%
Dog 23.00% 57.50% 82.10%
Frog 24.20% 90.20% 76.60%
Horse 34.70% 78.20% 84.70%
Ship 31.70% 95.50% 83.20%
Truck 95.90% 97.10% 84.60%
6.Results
The performance analysis of CNN's is done by testing each of the networks on CIFAR-100 and CIFAR-10 datasets. Table 1
depicts the accuracy of various image categories of CIFAR- 100 test dataset. For example, out of 100 test images of Bus,
Alex Net predicts 84 images label correctly, whereas GoogLeNet detects bus in around 63 images and ResNet50 classifies
37 images labeled as abus. Table 1 and Table 2 show the prediction accuracy of CNN's when tested for various image
categories of CIFAR- 100 and CIFAR-10 test datasets. For 100 images of Horse, AlexNet identifies horse in 35 images,
GoogLenet finds ahorse in 78 images and ResNet50 classifies 85 images as horse labeled.Considering the probability values
of all three CNN’s calculated from confusion matrix after testing, a detailed preview of prediction done by three CNN's are
as follow.
Bed
Chair
Couch
Table
Wardrobe
Fig 2:: Few classes of CIFAR10 and CIFAR100 Datasets
Airplane
Automobile
Bird
Cat
Deer
Dog
Neha Sharma et al. / Procedia Computer Science 132 (2018) 377384 381
Table 3. Performance on Bicycle class of CIFAR-100 dataset
AlexNet’s
Output
Prediction
Accuracy
(%)
GoogLeNet’s
Output
Prediction
Accuracy
(%)
ResNet50
Output
Prediction
Accuracy
(%)
Motorcycle 45 Bicycle 74.2 Bicycle 55
Bus 28 Train 13 Motorcycle 35
Bicycle 21 Table 7.6 Streetcar 4.4
Chair 2 Motorcycle 4.4 Couch 2.6
Train 2 Chair 0.4 Bed 1
Streetcar 1 Wardrobe 0.2 Train 0.8
Wardrobe 1 Bus 0.2 Wardrobe 0.6
Couch 0 Streetcar 0 Table 0.6
Bed 0 Couch 0 Bus 0
Table 0 Bed 0 Chair 0
Table 4. Performance on Chair class of CIFAR-100 dataset
AlexNet’s
Output
Prediction
Accuracy
(%)
GoogLeNet’s
Output
Prediction
Accuracy
(%)
ResNet50
Output
Prediction
Accuracy (%)
Chair 90 Chair 89.6 Chair 57.6
Wardrobe 5 Bed 7 Couch 21
Bus 3 Table 2.8 Bed 7.4
Motorcycle 1 Wardrobe 0.4 Wardrobe 5.8
Couch 1 Train 0.2 Train 5.4
Bed 0 Bicycle 0 Motorcycle 2
Bicycle 0 Bus 0 Streetcar 0.6
Streetcar 0 Couch 0 Bicycle 0.2
Table 0 Motorcycle 0 Bus 0
Train 0 Streetcar 0 Train 0
Table 3 depicts the prediction accuracy of all three networks for Bicycle class. We can see that AlexNet’s top prediction for
bicycle class is a motorcycle. GoogLe Net shows best performance and ResNet gives the average result. Similarly, Table 4
shows the output of CNN's for chair class.
Table 5. Performance on Deer class of CIFAR-10 dataset
AlexNet’s Prediction GoogLeNet’s Prediction ResNet50 Prediction
Output Accuracy Output Accuracy Output Accuracy
(%) (%) (%)
Deer 87.6 Deer 49.5 Deer 75.4
Horse 3.7 Horse 24.4 Horse 10.7
Ship 3.4 Cat 13.3 Bird 3.5
Frog 2.2 Frog 6 Airplane 3.3
Truck 1.6 Bird 3 Dog 2.6
Airplane 1.2 Ship 2 Cat 2.5
Automobile 0.2 Airplane 1.1 Frog 1.6
Dog 0.1 Truck 0.3 Ship 0.3
Bird 0 Dog 0.4 Truck 0.1
Cat 0 Automobile 0 Automobile 0
382 Neha Sharma et al. / Procedia Computer Science 132 (2018) 377384
Table 6. Performance on Ship class of CIFAR-10 dataset
AlexNet’s
Output
Prediction
Accuracy
(%)
GoogLeNet’s
Output
Prediction
Accuracy
(%)
ResNet50
Output
Prediction
Accuracy
(%)
Truck 50.6 Ship 95.5 Ship 83.2
Ship 31.7 Truck 2.2 Airplane 14.4
Airplane 12.3 Cat 1.2 Truck 0.5
Deer 3.1 Airplane 0.6 Cat 0.5
Automobile 1.5 Automobile 0.3 Horse 0.4
Horse 0.8 Bird 0.2 Dog 0.3
Bird 0 Deer 0 Bird 0.3
Cat 0 Dog 0 Deer 0.2
Dog 0 Frog 0 Automobile 0.1
Frog 0 Horse 0 Frog 0.1
Table 5 compares the output of three networks for Deer class. In other words, both the networks provide consistently correct
classifications. By observing all the tables, the classifications accuracy obtained for all images across all categories, are
different. AlexNet essentially see a Motorcycle in top prediction, while GoogLeNet and ResNet50 see a bicycle in top
prediction for bicycle class. For other less frequent classes, there is still a large overlap across different categories.Similarly,
Table 6 presents results for the ship class. The predicted label along with its score shows how accurately the object is
detected by a particular network. While analyzing each table independently, one can observe that for most of the categories
of Cifar-100 dataset, GoogleNet does the correct labeling and classification while ResNet50 identifies an average number of
classes of CIFAR-100 dataset.But for CIFAR 10 ResNet50 shows best classification results and GoogLeNet remains
average. Nonetheless, both networks are quite consistent, having high counts for a small subset of classes. The reason for
this behavior seems to be the fact that most classifiers are trained for object categories that contain simple, thin traces in
their composition, such as safety pins and bowstrings. It is therefore understandable that the networks may mistake with
appearance and properties of objects.
Table 7. Performance of CNNs on live video feeds
Object
Category
AlexNet
Prediction
Accuracy(%)
GoogleNet
Prediction
Accuracy(%)
ResNet50
Prediction
Accuracy(%)
Object
Category
AlexNet
Prediction
Accuracy(%)
GoogleNet
Prediction
Accuracy(%)
ResNet50
Prediction
Accuracy(%)
Bed 12 85 25 Airplane 14 84 96
Bicycle 11 80 55 Automobile 12 59 56
Bus 14 74 25 Bird 11 45 53
Chair 12 47 30 Cat 11 62 49
Couch 12 25 90 Deer 12 45 33
Motorcycle 14 50 35 Dog 12 57 58
Streetcar 11 45 25 Frog 13 60 25
Table 11 63 50 Horse 12 87 65
Train 15 72 45 Ship 15 91 25
Wardrobe 14 84 32 Truck 22 95 52
The real-time analysis of the performance of convolutional neural networks shows that Alex Net has overall 13% accuracy
of detecting correct objects in the scene. Similarly, GoogleNet and ResNet50 classification is 68.95% and 52.55% correct. It
can be observed that performance of CNN's on images vary substantially compared to live testing results. In live testing,
CNNs get confused between few objects, for example, ResNet50 often has a problem in classifying dog and deer. It detects
them as a horse in most of the scenes. The accuracy results prove that GoogleNet performance is better and detection
accuracy is highest compared to all other nets.
7. Evaluation
Both of the CNN produce a probability distribution in the possible input classes. Two different methods were used to
calculate the results. The first method only considers the 10 most probable classes and the second register the position of the
correct class in the full probability range. In the first method, we classify the results of the network according to their
probability and consider only the ten most probable classes. We count how many times each class appears for each image in
Neha Sharma et al. / Procedia Computer Science 132 (2018) 377384 383
each target category. This method allows you to evaluate if a good and useful probability is assigned to the correct result,
but also to observe qualitatively the consistency of the results for each category i.e., it is expected that for each categor y, the
top 10 probabilities do not vary significantly. In the second method, we construct descriptive statistics about the position of
the correct class in the probability range. This is achieved by ranking the results obtained by the classifier. The higher the
rank, the better the classification is. Ideally, the correct class will be in first place. Calculate the mean and the standard
deviation for each category. A low average corresponds to a higher position in the rankings, while a low standard deviation
is a proof of the consistency of production for the different instances of the same category. It also allows you to capture the
best and worst instances of each category that we use to analyze the possible reasons for the observed results. Finally, we
can infer from the obtained results that the average performance of these three networks on CIFAR100 dataset is found to be
as: for AlexNet average performance is 44.10 %, for GoogLeNet it is 64.40% and for ResNet50 an average performance of
59.82% is reported by our experimental study [20]. Similarly, the average performance of CNN's for theCIFAR10 dataset is
as follows: for AlexNet- 36.12 %, for GoogLeNet- 71.67%, and for ResNet50- 78.10% is found.
Fig 3: (a) Probability vs Categories graph for CIFAR- 100 dataset (b) Probability vs Categories graph for CIFAR- 10
dataset
8. Conclusion
The work analyzed the prediction accuracy of three different convolutional neural networks (CNN) on most popular training
and test datasets namely CIFAR10 and CIFAR100. We focused our study on 10 classes of each dataset only. Our main
purpose was to find out the accuracy of the different networks on same datasets and evaluating the consistency of prediction
by each of these CNN. We have presented a thorough prediction analysis for comparing the networks’ performance for
different classes of objects. It is important to note that complex frames often create confusion for the network to detect and
recognize the scene. It was also noted that though in real-world beds and couches as well as chair are different and easily
recognized objects but the trained networks showed confusion and therefore differ in accuracy rates. The results suggested
that trained networks with transfer learning performed better than existing ones and showed higher rates of accuracy. Few
objects like “chair”, “train” and “wardrobe” were perfectly recognized by 147 layered networks whereas objects like “cars”
were perfectly recognized by 177 layered networks. From our experiments, we could easily conclude that the performance
of 27 layered networks was not much appreciated. Hence, more the number of layers, more will be the training and
therefore, higher the rate of accuracy in prediction will be achieved. It can further be summed up that neural networks are
new and best emerging techniques for making amachine intelligent for solving many real-life object categorization
problems. Many types of research and works are being done on it. It has wide applications and it is easy and flexible to
integrate into various platforms. The hardware requirements may not allow the network to be trained on normal desktop
work but just with nominal requirements one can train the network and generate the desired model.
Refer
Connect broadband