Home

Results 1 - 10 of 250,210 for Web scraping. Search took 7.248 seconds.  
Sort by date/Sort by relevance
Presentazione di PowerPoint Experiences in the Use of Big Data for Official Statistics Antonino Virgillito Istat Think Big - Data innovation in Latin America Santiago, Chile 6th March 2017 Antonino Virgillito Experiences in the Use of Big Data for Official Statistics Introduction The use of Big Data sources for the production of official statistics has been the subject of a lively discussion within the statistical community in recent years, producing a significant body of study and work Particularly, Istat has developed an extensive experience in this area, with several ongoing initiatives We present some of these experiences and highlight results and lessons learned Antonino Virgillito Experiences in the Use of Big Data for Official Statistics The Big Data Global Trend Result of searching “Big Data” in Google Trends Antonino Virgillito Experiences in the Use of Big Data for Official Statistics Everyone started from ground zero, many lessons learned together Several initiatives at national and international level were organized The Path to Big Data in Official Statistics Official statistics community gets involved UNECE Big Data Project (2014 - 2015) ESSnet Big Data project (2016 - 2018) Demonstrate feasibility of production based on Big Data sources Integration of Big Data in the regular production of official statistics Antonino Virgillito Experiences in the Use of Big Data for Official Statistics Experiences in Istat Web Scraping Scraping of enterprise web sites to determine information about enterprises Extraction of prices of products from e-commerce web sites Scanner data Production of CPI Indexes Business Statistics Antonino Virgillito Experiences in the Use of Big Data for Official Statistics Web Scraping Scrape textual content of a large number of web sites and analyze it offline to determine some information through text mining techniques Extract specific information from semi-structured web sites through custom software or automation tools (robots) Two approaches Antonino Virgillito Experiences in the Use of Big Data for Official Statistics Web Scraping Enterprise Web Sites General Objective: to investigate whether web scraping, text mining and inference techniques can be used to collect, process and improve general information about enterprises Enterprises Web sites National Business Register Business Statistics Surveys Crawling Scraping Indexing Searching Antonino Virgillito Experiences in the Use of Big Data for Official Statistics Using Scraped Data Use case 1: URLs Inventory Use case 2: Web sales - ECommerce Use Case 3: Social Media Presence Use Case 4: Job Advertisement Antonino Virgillito Experiences in the Use of Big Data for Official Statistics Use Case 1: URLs Inventory • Main Identified Population (ICT Survey): – Enterprises with at least 10 Employees – Not all of them have a web site, but for those of them that do have it, the URLs of the web sites are not fully available • The URLs Retrieval problem: – Given a set of identifiers (denomination, fiscal code, economic activity, etc.) for the enterprise X, searching the Web for • Retrieving a set of associated URLs • Estimate (if any) which is the URL corresponding to the web site of X Antonino Virgillito Experiences in the Use of Big Data for Official Statistics URLs Inventory: Technique and Results • Steps – Query a search engine for enterprise name – Crawl the returned pages and score them according to content – Classify the results with machine learning approach • Machine learning step – Logisitic model fitted on a training set, and then applied to the set of all other enterprises – Application of the model to the set of unlabeled (i.e. not belonging to the training set) enterprises • Total number of identified URLs equal to about 105,000 out of 130,000 websites pertaining to the enterprises population – 81% coverage Antonino Virgillito Experiences in the Use of Big Data for Official Statistics Use Case 2: Web sales - Ecommerce ICT Survey Predict whether an enterprise provides or not web sales facilities on its website Antonino Virgillito Experiences in the Use of Big Data for Official Statistics Use Case 3: Social Media Presence Information on existence of the particular enterprises in social media (mainly Twitter and Facebook) Antonino Virgillito Experiences in the Use of Big Data for Official Statistics Use Case 4: Job Advertisement Investigating how enterprises use their websites to handle the job advertisements, and in particular if they publish job advertisement or not Antonino Virgillito Experiences in the Use of Big Data for Official Statistics Technique and Results • Prediction realized through different classification algorithms – logistic model, classification trees, random forests, boosting, bagging, neural net, Naive Bayes, SVM • Evaluation of algorithms performance according to different indicators • Quality of results still to be improved – Social media use case almost ready for production Antonino Virgillito Experiences in the Use of Big Data for Official Statistics Web Scraping Prices • The collection of data from the Internet through the extraction of structured content from web pages is an established technique for statistical data collection – Replace repetitive centralized tasks – Possibility for getting more data • Price data is particularly attractive… – A lot of prices on the Internet! – Common practice in European NSIs Antonino Virgillito Experiences in the Use of Big Data for Official Statistics Web Scraping Prices at Istat Consumer electronic products: collection of prices from 4 different e-commerce web sites, including Amazon. (...) Transport sector: cost of tickets for trains and flights (Experimental) 17 types of products currently collected through scraping in production Antonino Virgillito Experiences in the Use of Big Data for Official Statistics Problems • Sustainability – The more we develop scraping solutions, the more maintenance is required – Maintenance requires dedicated IT resources • Scale – Scraping for prices is substantially a replacement for manual collection activity – Difficult to collect large data size – Data must be selected manually before collection Antonino Virgillito Experiences in the Use of Big Data for Official Statistics Web Scraping Prices Results and Next Steps • Significant improvements in efficiency have been achieved so far through tools and techniques that are now mature and familiar • The risk is that we are not able to reach to the next level of scale and exploit the full potential of web data • Are we ready to try new approaches?
Language:English
Score: 1891862.7 - https://www.cepal.org/sites/de.../files/antonino_virgillito.pdf
Data Source: un
Scraping data from the web | FAO DataLab Skip to main content Print Send FAO Data Lab Toggle navigation Main navigation About Products Food prices Vulnerability maps Social unrest analysis Covid-19 impact analysis Use of statistics in policy making Closed products News digests Collaborations Hand in Hand FAO OCS and Paris 21 FAOSTAT FLW database and SDG 12.3 FAOLEX Methods & areas of work Methods Scraping data from the web Text analytics Data validation Statistical modelling Creation of digests Areas Analysis of (social) media Filling gaps of statistical data Analysis of legislation & policies Use of geospatial data in ag statistics Contact Scraping data from the web The internet grants a wide scope of facts and data sources, which consist in an enormous assortment of dissimilar and poorly organized data. Web scraping involves fetching and extracting those data from web pages, creating properly organized information. Web scraping is usually associated to the Big Data paradigm, considering the variety of data sources.
Language:English
Score: 1850279.5 - https://www.fao.org/datalab/website/web/scraping-data-web
Data Source: un
REPORT OF THE MEETING OF THE GROUP OF EXPERTS ON CONSUMER PRICE INDICES, 13TH SESSION
These experiments are starting to produce results in terms of compiling price indices based exclusively on scraped data. (b) Experiments with different techniques for web scraping demonstrate two main approaches: i) Using tools (robots) that reproduce and automate manual steps by collecting data from Internet. ii) Implementing a specific parser (a software programme) for each retailer web site that extracts structured price data from an unstructured web page. (...) (d) Recording and classifying price observations correctly is a key common challenge identified by NSOs when using web scraping. Supervised machine learning methods are being experimented for identifying the correct COICOP category for each scraped item from its generic text description found on the web site. (...) Data seem to support the so-called 50% rule by which half of a price increase is attributed to an increase in quality. (f) Web scraping and scanner data are two aspects of Big Data that can be combined within the regular production process of compiling CPI.
Language:English
Score: 1751907.8 - https://daccess-ods.un.org/acc...DS=ECE/CES/GE.22/2016/2&Lang=E
Data Source: ods
Filling gaps of statistical data | FAO DataLab Skip to main content Print Send FAO Data Lab Toggle navigation Main navigation About Products Food prices Vulnerability maps Social unrest analysis Covid-19 impact analysis Use of statistics in policy making Closed products News digests Collaborations Hand in Hand FAO OCS and Paris 21 FAOSTAT FLW database and SDG 12.3 FAOLEX Methods & areas of work Methods Scraping data from the web Text analytics Data validation Statistical modelling Creation of digests Areas Analysis of (social) media Filling gaps of statistical data Analysis of legislation & policies Use of geospatial data in ag statistics Contact Filling gaps of statistical data The Data Lab collects data at the national (eventually filling any potential gaps of the National Statistical Systems) and sub-national (usually not collected by the FAO) levels to meet the need for more granular and more timely data in contexts where very little information is available, such as least developed countries, countries that lack territorial access to the sea, small island developing states, countries currently facing a food crisis, and highly populated countries. The strategy for filling the data gaps consists mainly in the use of non-traditional sources , such as datasets, data catalogues on the web, and textual resources containing data. The methodology used is characterised by a blend of big data solutions (such as web scraping, crowdsourcing, etc.) and  text-mining techniques (extracting data from documents).    (...) Food loss and waste data from non-conventional sources : the Data Lab scrapes from the world wide web all the publications containing data and information on food losses and waste (reports, studies, articles from various sources), and then analyses the results and models data with specific statistical methods.
Language:English
Score: 1725937.1 - https://www.fao.org/datalab/we.../filling-gaps-statistical-data
Data Source: un
GE.14-08582 (E)
Big data should be dealt with, including web scraping. In favour of a CPI-TEG. (l) Canada: Supports update. (...) Price collection by web scraping or Internet robots offers a new possibility to NSOs. (...) How can NSO extent the use of web scraping? There were some cautions raised: changes in websites may cause problems for the compilation of the regular CPI.
Language:English
Score: 1720911.5 - daccess-ods.un.org/acce...DS=ECE/CES/GE.22/2014/2&Lang=E
Data Source: ods
REPORT OF THE INTERSECRETARIAT WORKING GROUP ON PRICE STATISTICS :NOTE / BY THE SECRETARY-GENERAL
At the meeting, which attracted 500 participants, new data sources (scanner data and web scraping), quality changes and quality adjustment methods, and meeting user needs were discussed. (...) Workshop on scanner data and web scraping 23. Eurostat, together with the scanner data task team13 of the Committee of Experts on Big Data and Data Science for Official Statistics, organized a virtual workshop on scanner data and web scraping from 12 to 14 October 2021. (...) The participants ha d the opportunity to present and discuss their latest work in the fields of scanner data, web scraping, classification and validation, as well as to participate in tutorials and demonstrations.
Language:English
Score: 1720829 - https://daccess-ods.un.org/acc...?open&DS=E/CN.3/2022/36&Lang=E
Data Source: ods
IMPLEMENTATION OF THE UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE STATISTICAL PROGRAMME 2020 - ADDENDUM - REPORT OF THE REGIONAL WORKSHOP ON CONSUMER PRICE INDICES
On data collection methods, Norway presented internet purchase and web scraping. Using web scrapping as a tool to collect price information from the internet has many benefits, such as more prices are collected in less time and it can serve as alternative to scanner data. However, there are many challenges in working with web scraping, e. g. websites can change frequently, and it requires resources. Norway also provided an introduction to as well as a training on how to do web scraping using R software. B. Session 2: Seasonal items and missing items The session included presentations by Georgia, Ukraine, Kazakhstan, Norway.
Language:English
Score: 1676815 - https://daccess-ods.un.org/acc...S=ECE/CES/2020/14/ADD.3&Lang=E
Data Source: ods
REPORT
It was found promising to see more countries doing research on scanner data and web scraping methods and applying these in practice. (...) Obtaining expenditure weights for web scraped prices continue to be a challenge, and there is no obvious way of obtaining this information. (...) Countries may develop in-house software or buy this from a provider of software for web scraping. Both ways have their advantages and disadvantages that countries must consider. 20.
Language:English
Score: 1670544.5 - https://daccess-ods.un.org/acc...DS=ECE/CES/GE.22/2018/2&Lang=E
Data Source: ods
REPORT OF THE OTTAWA GROUP ON PRICE INDICES : NOTE / BY THE SECRETARY-GENERAL
Alternative data such as web-scraped data, transaction data, big data and administrative data pose challenges to tr aditional index compilation procedures and methodologies. (...) New and innovative ideas discussed included compiling indices using big data, transaction data and web-scraped data. The full report of the meeting provides a summary of the key points E/CN.3/2020/31 19-22146 4/4 that emerged from each session and feedback from the participants. (...) There will be a call for papers and discussions on: (a) New data sources for the compilation of price indices (scanner and web-scraped data; quality adjustment); (b) Compiling house price indices (residential and commercial); (c) Challenging areas of measurement (such as services); (d) Conceptual frameworks (index number formulae; multipurpose price statistics); (e) Treatment of special cases (strongly seasonal products; zero prices). 15.
Language:English
Score: 1665922.3 - https://daccess-ods.un.org/acc...?open&DS=E/CN.3/2020/31&Lang=E
Data Source: ods
EFFECT OF COVID-19 ON PRICE AND EXPENDITURE STATISTICS :COVID-19 COULD AFFECT THE REAL SIZE OF ARAB ECONOMIES
Directly before the pandemic, ESCWA conducted comprehensive training sessions in two member States, namely Bahrain and Kuwait, on the use and application of web scraping for price data collection, allowing them to start direct application in their offices. (...) The training aimed to assess the feasibility of price data collection through web scraping in Arab countries, with a focus on certain categories of household consumption goods, such as fast evolving technology items whose prices were to be scraped for the purposes of both CPI and ICP. (...) ESCWA prepared two different web scraping templates for Qatar, corresponding to two different online outlets that cover a wide range of household goods and services (not only items related 7 to fast evolving technology); and provided detailed electronic instructions in a manual to the Qatari statistical office to help conduct price web scraping given that no formal training had yet been conducted.
Language:English
Score: 1659839.8 - https://daccess-ods.un.org/acc...CWA/CL4. SIT/2020/INF.1&Lang=E
Data Source: ods