Second-order co-occurrence pointwise mutual information

Second-order co-occurrence pointwise mutual information

In computational linguistics, second-order co-occurrence pointwise mutual information (SOC-PMI) is a method used to measure semantic similarity, or how close in meaning two words are. The method does not require the two words to appear together in a text. Instead, it works by analyzing the "neighbor" words that typically appear alongside each of the two target words in a large body of text (corpus). If the two target words frequently share the same neighbors, they are considered semantically similar. For example, the words "cemetery" and "graveyard" may not appear in the same sentence often, but they both frequently appear near words like "buried," "dead," and "funeral." SOC-PMI uses this shared context to determine that they have a similar meaning. The method is called "second-order" because it doesn't look at the direct co-occurrence of the target words (which would be first-order), but at the co-occurrence of their neighbors (a second level of association). The strength of these associations is quantified using pointwise mutual information (PMI). == History == The method builds on earlier work like the PMI-IR algorithm, which used the AltaVista search engine to calculate word association probabilities. The key advantage of a second-order approach like SOC-PMI is its ability to measure similarity between words that do not co-occur often, or at all. The British National Corpus (BNC) has been used as a source for word frequencies and contexts for this method. == Methodology == The SOC-PMI algorithm measures the similarity between two words, w 1 {\displaystyle w_{1}} and w 2 {\displaystyle w_{2}} , in several steps. === Step 1: Score neighboring words with PMI === First, for each target word ( w 1 {\displaystyle w_{1}} and w 2 {\displaystyle w_{2}} ), the algorithm identifies its "neighbor" words within a certain text window (e.g., within 5 words to the left or right) across a large corpus. The strength of the association between a target word t i {\displaystyle t_{i}} and its neighbor w {\displaystyle w} is calculated using pointwise mutual information (PMI). A higher PMI value means the two words appear together more often than would be expected by chance. The PMI between a target word t i {\displaystyle t_{i}} and a neighbor word w {\displaystyle w} is calculated as: f pmi ( t i , w ) = log 2 ⁡ f b ( t i , w ) × m f t ( t i ) f t ( w ) {\displaystyle f^{\text{pmi}}(t_{i},w)=\log _{2}{\frac {f^{b}(t_{i},w)\times m}{f^{t}(t_{i})f^{t}(w)}}} where: f b ( t i , w ) {\displaystyle f^{b}(t_{i},w)} is the number of times t i {\displaystyle t_{i}} and w {\displaystyle w} appear together in the context window. f t ( t i ) {\displaystyle f^{t}(t_{i})} is the total number of times t i {\displaystyle t_{i}} appears in the corpus. f t ( w ) {\displaystyle f^{t}(w)} is the total number of times w {\displaystyle w} appears in the corpus. m {\displaystyle m} is the total number of tokens (words) in the corpus. === Step 2: Create a semantic 'signature' for each word === For each target word ( w 1 {\displaystyle w_{1}} and w 2 {\displaystyle w_{2}} ), the algorithm creates a list of its most significant neighbors. This is done by taking the top β {\displaystyle \beta } neighbor words, sorted in descending order by their PMI score with the target word. This list of top neighbors, X w {\displaystyle X^{w}} , acts as a semantic "signature" for the word w {\displaystyle w} . X w = { X i w } {\displaystyle X^{w}=\{X_{i}^{w}\}} , for i = 1 , 2 , … , β {\displaystyle i=1,2,\ldots ,\beta } The size of this list, β {\displaystyle \beta } , is a parameter of the method. === Step 3: Compare the signatures === The algorithm then compares the signatures of w 1 {\displaystyle w_{1}} and w 2 {\displaystyle w_{2}} . It looks for words that are present in both signatures. The similarity of w 1 {\displaystyle w_{1}} to w 2 {\displaystyle w_{2}} is calculated by summing the PMI scores of w 2 {\displaystyle w_{2}} with every word in w 1 {\displaystyle w_{1}} 's signature list. The β {\displaystyle \beta } -PMI summation function defines this score. The score for w 1 {\displaystyle w_{1}} with respect to w 2 {\displaystyle w_{2}} is: f ( w 1 , w 2 , β ) = ∑ i = 1 β ( f pmi ( X i w 1 , w 2 ) ) γ {\displaystyle f(w_{1},w_{2},\beta )=\sum _{i=1}^{\beta }(f^{\text{pmi}}(X_{i}^{w_{1}},w_{2}))^{\gamma }} This sum only includes terms where the PMI value is positive. The exponent γ {\displaystyle \gamma } (with a value > 1) is used to give more weight to neighbors that are more strongly associated with w 2 {\displaystyle w_{2}} . This calculation is done in both directions: The similarity of w 1 {\displaystyle w_{1}} with respect to w 2 {\displaystyle w_{2}} : f ( w 1 , w 2 , β 1 ) = ∑ i = 1 β 1 ( f pmi ( X i w 1 , w 2 ) ) γ {\displaystyle f(w_{1},w_{2},\beta _{1})=\sum _{i=1}^{\beta _{1}}(f^{\text{pmi}}(X_{i}^{w_{1}},w_{2}))^{\gamma }} The similarity of w 2 {\displaystyle w_{2}} with respect to w 1 {\displaystyle w_{1}} : f ( w 2 , w 1 , β 2 ) = ∑ i = 1 β 2 ( f pmi ( X i w 2 , w 1 ) ) γ {\displaystyle f(w_{2},w_{1},\beta _{2})=\sum _{i=1}^{\beta _{2}}(f^{\text{pmi}}(X_{i}^{w_{2}},w_{1}))^{\gamma }} === Step 4: Calculate final similarity score === Finally, the total semantic similarity is the average of the two scores from the previous step. S i m ( w 1 , w 2 ) = f ( w 1 , w 2 , β 1 ) β 1 + f ( w 2 , w 1 , β 2 ) β 2 {\displaystyle \mathrm {Sim} (w_{1},w_{2})={\frac {f(w_{1},w_{2},\beta _{1})}{\beta _{1}}}+{\frac {f(w_{2},w_{1},\beta _{2})}{\beta _{2}}}} This score can be normalized to fall between 0 and 1. For example, using this method, the words cemetery and graveyard achieve a high similarity score of 0.986 (with specific parameter settings).

System requirements specification

A System Requirements Specification (SysRS) (abbreviated SysRS to be distinct from a software requirements specification (SRS)) is a structured collection of information that embodies the requirements of a system. A business analyst (BA), sometimes titled system analyst, is responsible for analyzing the business needs of their clients and stakeholders to help identify business problems and propose solutions. Within the systems development life cycle domain, the BA typically performs a liaison function between the business side of an enterprise and the information technology department or external service providers.

Digital on-screen graphic

A digital on-screen graphic, digitally originated graphic (DOG, bug, network bug, on-screen bug or screenbug) is a watermark-like station logo that most television broadcasters overlay over a portion of the screen area of their programs to identify the channel. They are thus a form of permanent visual station identification, increasing brand recognition and asserting ownership of the video signal. The graphic identifies the source of programming, even if it has been time-shifted or recorded. Many of these technologies allow viewers to skip or omit traditional between-programming station identification; thus the use of a DOG enables the station or network to enforce brand identification even when standard commercials are skipped. DOG watermarking helps to reduce off-the-air copyright infringement—for example, the distribution of a current series' episodes on DVD: the watermarked content is easily differentiated from "official" DVD releases, and can help identify not only the station from which the broadcast was captured, but usually the actual date of the broadcast as well. Graphics may be used to identify if the correct subscription is being used for a type of venue. For example, showing Sky Sports within a pub in the United Kingdom requires a more expensive subscription; a channel authorized under this subscription adds a pint glass graphic to the bottom of the screen for inspectors to see. The graphic changes at certain times, making it harder to counterfeit. On the other hand, watermarks pollute the picture, distract viewers' attention and may cover an important piece of information presented in the television program. Extremely bright watermarks may cause screen burn-in or image persistence on some types of television sets such as the now mostly discontinued and rarely used plasma and CRT displays, and currently commonly used OLED and LCD displays. Usage of visually perceptible embedded watermarks requires the program author to have a separate clean copy for archival purposes, but this practice was not common decades ago when watermarking became popular among broadcasters. Watermarks present an issue when archival videos are used for a documentary that strives to create a coherent story. In some cases, watermarks are blurred or digitally removed if possible to clean up the picture. In the absence of visually perceptible watermarks, content control can be ensured with visually imperceptible digital watermarks. In some cases, the graphic also shows the name of the current program. Some television networks may place additional logos or text alongside their DOG to advertise significant upcoming programs. For example, broadcasters of the Olympic Games (most notably United States broadcaster NBC) often add the Olympic rings to their DOG for a period of time leading up to and during the Games. == Usage == == Connections with sponsor tags == Another graphic on television usually connected with sports (particularly in North America, though not in Europe) is the sponsor tag. It shows the logos of certain sponsors, accompanied by some background relevant to the game, the network logo, announcement and music of some kind. == Usage in ham radio and television == In most countries, the ham station is required to periodically identify their amateur-television transmission. Such stations frequently overlay their callsign on the signal instead of placing a card in the background. Most hams use homebuilt devices or old consumer character generators to generate such identifications rather than using graphical superimposes of high cost to do so. Only rarely one can see real graphics, as the callsign is usually written in the "OSD font". == Live DOGs by hobbyists == One of the easiest and most sought-after devices used to generate DOGs by hobbyists is the 1980s vintage Sony XV-T500 video superimposer. This device can luma-key a signal, capture a still frame into memory and then overlay the keyed graphic in one of eight colors onto any CVBS signal. Another method commonly used by hobbyists and even low-budgeted television stations was Amiga computers with genlock interfaces.

WeChat

WeChat or Weixin in Chinese (Chinese: 微信; pinyin: Wēixìn ; lit. 'micro-message') is an instant messaging, social media, and mobile payment app developed by Tencent. First released in 2011, it became the world's largest standalone mobile app in 2018 with over 1 billion monthly active users. The Chinese version of WeChat, Weixin, has been described as China's "app for everything" and a super-app because of its wide range of functions. WeChat provides text messaging, hold-to-talk voice messaging, broadcast (one-to-many) messaging, video conferencing, video games, mobile payment, sharing of photographs and videos and location sharing. It has been described as having "an almost indispensable part of life in China". Accounts registered using Chinese phone numbers are managed under the Weixin brand, and their data is stored in mainland China and subject to Weixin's terms of service and privacy policy. Non-Chinese numbers are registered under WeChat, and WeChat users are subject to a more liberal terms of service and better privacy policy, and their data is stored in the Netherlands for users in the European Union, and in Singapore for other users. User activity on Weixin, the Chinese version of the app, is analyzed, tracked and shared with Chinese authorities upon request as part of the mass surveillance network in China. Chinese-registered Weixin accounts censor politically sensitive topics, and the software license agreement for Weixin (but not WeChat) explicitly forbids content which "[en]danger[s] national security, divulge[s] state secrets, subvert[s] state power and undermine[s] national unity", as well as other types of content such as content that "[u]ndermine[s] national religious policies" and content that is "[i]nciting illegal assembly, association, procession, demonstrations and gatherings disrupting the social order". Due to its central part of Chinese life, a Chinese person having their WeChat account banned can cause a significant disruption to their life. Any interactions between Weixin and WeChat users are subject to the terms of service and privacy policies of both services. == History == By 2010, Tencent had already attained a massive user base with their desktop messenger app QQ. Recognizing smart phones were likely to disrupt this status quo, CEO Pony Ma sought to proactively invest in alternatives to their own QQ messenger app. WeChat began as a project at Tencent Guangzhou Research and Project center in October 2010. The original version of the app was created by Allen Zhang, named "Weixin" (微信) by Pony Ma, and launched in 2011. The user adoption of WeChat was initially very slow, with users wondering why key features were missing; however, after the release of the Walkie-talkie-like voice messaging feature in May of that year, growth surged. By 2012, when the number of users reached 100 million, Weixin was re-branded "WeChat" by President Martin Lau for the international market. During a period of government support of e-commerce development—for example in the 12th five-year plan (2011–2015)—WeChat also saw new features enabling payments and commerce in 2013, which saw massive adoption after their virtual Red envelope promotion for Chinese New Year 2014. WeChat had over 889 million monthly active users by 2016, and as of 2019 WeChat's monthly active users had risen to an estimate of one billion. As of January 2022, it was reported that WeChat has more than 1.2 billion users. After the launch of WeChat payment in 2013, its users reached 400 million the next year, 90 percent of whom were in China. By comparison, Facebook Messenger and WhatsApp had about one billion monthly active users in 2016 but did not offer most of the other services available on WeChat. For example, in Q2 2017, WeChat's revenues from social media advertising were about US$0.9 billion (RMB6 billion) compared with Facebook's total revenues of US$9.3 billion, 98% of which were from social media advertising. WeChat's revenues from its value-added services were US$5.5 billion. By 2018, WeChat had been used by 93.5% of Chinese internet users. In that year, it became the world's largest standalone mobile app in 2018 with over 1 billion monthly active users. In response to a border dispute between India and China, WeChat was banned in India in June 2020 along with several other Chinese apps, including TikTok. U.S. president Donald Trump sought to ban U.S. "transactions" with WeChat through an executive order but was blocked by a preliminary injunction issued in the United States District Court for the Northern District of California in September 2020. Joe Biden officially dropped Trump's efforts to ban WeChat in the U.S. in June 2021. == Features == WeChat, has been described as China's "app for everything" and a super-app because of its wide range of functions. WeChat provides text messaging, hold-to-talk voice messaging, broadcast (one-to-many) messaging, video conferencing, video games, mobile payment, sharing of photographs and videos and location sharing. It has been described as having "an almost indispensable part of life in China". Due to its central part of Chinese life, a Chinese person having their WeChat account banned can cause a significant disruption to their life. === Messaging === WeChat provides a variety of features including text messaging, hold-to-talk voice messaging, broadcast (one-to-many) messaging, video calls and conferencing, video games, photograph and video sharing, as well as location sharing. WeChat also allows users to exchange contacts with people nearby via Bluetooth, as well as providing various features for contacting people at random if desired (if people are open to it). It can also integrate with other social networking services such as Facebook and Tencent QQ. Photographs may also be embellished with filters and captions, and automatic translation service is available and could also translate the conversation during messaging. WeChat supports different instant messaging methods, including text messages, voice messages, walkie talkie, and stickers. Users can send previously saved or live pictures and videos, profiles of other users, coupons, lucky money packages, or current GPS locations with friends either individually or in a group chat. WeChat also provides a message recall feature to allow users to recall and withdraw information (e.g. images, documents) that are sent within 2 minutes in a conversation. WeChat also provides a voice-to-text feature that brings convenience when it is not convenient to listen to voice messages, as well as the basic ability to recognize emojis based on different tones of voice. A distance sensing feature is implemented in WeChat. It has the ability to activate the receivers' hold-to-talk function when the phone was brought in close proximity to the ear. After the receiver was held at a certain distance from the ear, the sensor would then proceed to automatically disable the phone speakers. This feature eliminates the risk of the user's voice messages being inadvertently broadcast to the general public. === Public accounts === WeChat users can register as a public account (公众号), which enables them to push feeds to subscribers, interact with subscribers, and provide subscribers with services. Users can also create an official account, which fall under service, subscription, or enterprise accounts. Once users as individuals or organizations set up a type of account, they cannot change it to another type. By the end of 2014, the number of WeChat official accounts had reached 8 million. Official accounts of organizations can apply to be verified (cost 300 RMB or about US$45). Official accounts can be used as a platform for services such as hospital pre-registrations, or credit card service. To create an official account, the applicant must register with Chinese authorities, which discourages "foreign companies". In April 2022, WeChat announced that it will start displaying the location of users in China every time they post on a public account. Meanwhile, overseas users on public accounts will also display the country based on their IP address. === Moments === "Moments" (朋友圈) is WeChat's brand name for its social feed of friends' updates. "Moments" is an interactive platform that allows users to post images, text, and short videos taken by users. It also allows users to share articles and music (associated with QQ Music or other web-based music services). Friends in the contact list can like the content and leave comments, functioning similarly to a private social network. In 2017 WeChat had a policy of a maximum of two advertisements per day per Moments user. Privacy in WeChat works by groups of friends: only the friends from the user's contact are able to view their Moments' contents and comments. The friends of the user will only be able to see the likes and comments from other users only if they are in a mutual friend group. For example, friends from high school are not able to

Plane Finder

Plane Finder is a United Kingdom-based real-time flight tracking service launched in 2009, that is able to show flight data globally. The data available includes flight numbers, how fast an aircraft is moving, its elevation and destination of travel. Several variants of the service are available as mobile apps including free, premium 3D and augmented reality versions. The flight tracking map and database can be accessed by web browsers. Plane Finder allows registered users to share their ADS-B and MLAT data via the Plane Finder ADS-B Client, available for macOS, Windows and Linux. Plane Finder supports VFR charts from NATS, and was the first major flight tracking app to introduce a replay feature, allowing users to replay flights dating back to 2011. == Flight tracking == Plane Finder collects data from its own global network of receivers, using the following sources. === Automatic dependent surveillance-broadcast (ADS-B) === A network of automatic dependent surveillance-broadcast (ADS-B) receivers gathers aircraft data such as callsign, position and speed. Plane Finder serves to supplement this data with additional information, including aircraft registration/tail number, departure airport, destination, artwork, and photographs. Plane Finder users can apply for an ADS-B receiver in exchange for their flight data. === Multilateration (MLAT) === To deliver aircraft position data where ADS-B is unavailable, Plane Finder uses multilateration (MLAT). Using three or more receivers running Plane Finder client software, monitoring the aircraft simultaneously, the aircraft’s position is calculated using receiver location and accurate timestamps. While European airspace is widely covered, only some parts of North American airspace are covered. === Federal Aviation Administration (FAA) feed === ADS-B is prevalent across Europe and Australia, but not in North America. Where MLAT or ADS-B data is unavailable, a feed from the Federal Aviation Administration provides flight information. The FAA feed covers United States and Canadian airspace, including bordering areas of the Atlantic and Pacific Oceans. === FLARM feed === Plane Finder collects data from a centralised FLARM feed, for monitoring small aircraft and gliders. == Flight data source == The Plane Finder website and database is widely used as an information source to support articles in the media. The Independent used Plane Finder flight tracking to demonstrate to readers the flight path of flight MT2706, which turned back as a result of last minute Egyptian government flight restrictions on 6 November 2015. The Independent also used Plane Finder information to demonstrate a timeline of the speed/altitude of flight 7K 9268, a Russian plane which crashed on 31 October 2015. The BBC cited Plane Finder in regard to the point at which at British Airways flight turned back to Heathrow Airport to make an emergency landing after smoke was seen coming from its engines. Plane Finder data has also been used to create original imagery for the media, such as the Washington Post, which used Plane Finder as a source to show flight patterns immediately after the Brussels bombings in March 2016.

IMazing

iMazing is mobile device management software that allows users to transfer files and data between iOS devices (iPhone, iPad and iPod Touch) and macOS or Windows computers, in addition to many other features beyond the scope of what Apple's own tools enable. == History == Developed by DigiDNA, iMazing was initially released in 2008 as DiskAid, enabling users to transfer data and files from the iPhone or iPod Touch to Mac or Windows computers. DiskAid was renamed iMazing in 2014. Version 2.0 was released on September 13, 2016. In August 2021, version 2.14 of iMazing added a spyware detection feature. The feature is based on Amnesty International’s Mobile Verification Toolkit to detect Pegasus Spyware following the publication of Pegasus Project. == Description == With iMazing, an iPhone or iPad can be used similarly to an external hard drive. It performs tasks that iTunes doesn’t offer, including incremental backups of iOS devices, browsing and exporting text and voicemail messages, managing apps, encryption, and migrating data from an old phone to a new one. The menu bar app iMazing Mini enables automatic, wireless and encrypted backups of iPhones. The iMazing HEIC Converter is a free desktop app for Mac and PC that lets users convert photos from HEIC format to JPG or PNG.

CrocBITE

CrocBITE (currently CrocAttack) was an online database of wild crocodilian attacks reported on humans in the world. The non-profit online research tool helped to scientifically analyze crocodilian behavior via complex models. Users were encouraged to feed information in a crowdsourcing manner. This website excludes captive crocodilian attacks, as well as non-fatal bites on professional handlers, rangers, staff, or researchers, and crocodilian attacks on pets and livestock, because its primary goal is to analyze natural human-crocodilian conflict in the wild for conservation and management purposes, and that these incidents do are not considered indicative of natural species behavior or typical human-wildlife conflict, as well as not providing enough useful data and helping researchers understand wild population behavior or typical human-wildlife conflict dynamics and helps create safety strategies for people living or working near wild crocodilians, rather than tracking workplace accidents in zoos or farms. While fatal incidents involving handlers are sometimes included on the website, typical captive incidents (such as handlers being bitten by them in zoos) are excluded because they are considered manageable professional risks rather than general public safety threats. == About == The online database was established in 2013 (2013) by Dr Adam Britton, a researcher at Charles Darwin University, his student Brandon Sideleau and Erin Britton. It was a compilation of government records, individual reports, registered contributors and historical data. Dr Simon Pooley, Junior Research fellow, Imperial College London joined hands to further the studies. The collaboration culminated when Dr Pooley met Dr Britton at the IUCN Crocodile Specialist Group, in Louisiana in 2014. The program received funds from Economic and Social Research Council, United Kingdom to the tune of A$30,000 and unspecified resourced plus amount from Big Gecko Crocodilian Research, Crocodillian.com and Charles Darwin University. The research yielded pertinent observations that provide inside into crocodile attacks. It was observed that most attacks on humans occur from bites of Saltwater crocodile as against the popular understanding of Nile crocodiles taking the top spot. This is not, however, believed to be the actual case, as most attacks by the Nile crocodile are believed to go unreported or only reported on a local level. The broad category of Nile crocodile attacks were segmented into West African crocodile and Crocodylus niloticus (the Nile Crocodile) species to get a clear understanding of their respective attack zones. The objective was that the information would be used by communities and conservation managers to help inform and educate people about how to keep safe. The information was vital for Australia and Africa where such attacks are more likely than in other parts of the world. This was the only database of its kind with such comprehensive collection of information made available online. The database is no longer online, and its founder Adam Britton is in custody having pleaded guilty to charges of bestiality on September 25, 2023. It has been rebranded and renamed CrocAttack, and serves as a updated database focusing on human-crocodilian conflict and records over 8,500 incidents from the past decades.