OpenRefine
What is OpenRefine?
- openrefine.org
- part of the data-handling pipeline
- open source (orig. Google project)
- a database-spreadsheet hybrid: interactive as a spreadsheet, programmable as a database
- excellent data cleaner
- http://127.0.0.1:3333/ - works in your browser
- Other functionality, besides cleaning and repair
- webscraping
- extend data, link your dataset with webservices
- convert & export data
Documentation
- Quick Introduction: Bradshaw, Paul. 2011. “Cleaning Data Using Google Refine: A Quick Guide.” Online Journalism Blog. July 5, 2011. https://onlinejournalismblog.com/2011/07/05/cleaning-data-using-google-refine-a-quick-guide-2/.
- OpenRefine Official Documentation
Installation
1 data import
- structured data: Create a project by importing data
- non-structured data: webscraping
- parseHTML functie
- URI structure (bradshaw chap6)
- GREL
- grab and extract html wiki - stripping html
- css selectors
- scraping-data-with-google-refine
- exampl html tag parsing
- Web Scraping - John Little
2 dataset analysis
- interface: Interface tour
- filter: Toledo - Using OpenRefin p51
- faceting:
- User Manual
- OpenRefine course: Data Mining and Discovery Text Based Facet + two other videos on the same page
- Toledo - Using OpenRefine p39
3 edit / analyze & fix
data profiling - Toledo - Using OpenRefine p21
What is 'cleaning'? (aka data normalization)
- remove (consecutive) white space
- convert special html characters (entities)
- transform upper and lower case letters Toledo - Using OpenRefine p53
- normalize figures and other formats
- clustering
- split or merge cells (eg address)
- sorting (makes data easier to manipulate) OPenRefine course, Toledo - Using OpenRefine p37
- faceting (only show rows that meet certain criteria)
- find and remove duplicates Toledo - Using OpenRefine p49
- filtering (with regex - GREL)
- ...
20180430-clean_data_mdg18 from Hans Coppens on Vimeo.
- cell editing
- through facets
- by transforming
- Toledo - Using OpenRefine p38 Simple cell transformations, Toledo - Using OpenRefine p 55 transforming cell values
- search and replace
- clustering and Clustering in depth (methodologies)
- OpenRefine course
- Toledo - Using OpenRefine p52 clustering similar cells
- Toledo - Using OpenRefine p46 handling multi-valued cells
- Toledo - Using OpenRefine p34 Detecting duplicates
- column editing
- Column Editing
- RefinePro youtube channel
- Toledo - Using OpenRefine p12 manipulating columns
- Toledo - Using OpenRefine p58-78 adding derived columns, splitting data across columns,
- row editing
- Toledo - Using OpenRefine p64 alternating between rows and records mode
- grel en regex
- Understanding Expressions en
- Understanding Regular Expressions
- OpenRefine course 5 video's
- Toledo - Using OpenRefine p96
- Hands-on: GREL
- variables
- stripping html
- recipes
- wiki - Recipes
- youtube openrefine channel
- Cleaning data using Google Refine: a quick guide
- Programming Historian
- mdg18 screencast
- OpenRefine course and two other videos on the same page
4 extend data
- wiki/Reconciliation, wiki/Reconcilable-Data-Sources, from-excel-file-to-rdf
- Toledo - Using OpenRefine p71 adding a reconciliation service
- wikidata
- [wiki/Reconciliation]https://docs.openrefine.org/manual/wikidata)
- youtube -
- Wikidata:Tools/OpenRefine
- youtube -
- OpenRefineを用いてWikidataの項目と照合する
- web services
5 data export
- User Manual Exporters
- Toledo - Using OpenRefine p32
6 History
- User Manual - History
- OpenRefine Course
- Toledo - Using OpenRefine p29
7 Other Sources
- “Converting Spreadsheet Rows to Text Based Summary Reports Using OpenRefine.” n.d. OUseful.Info, the Blog... (blog). Accessed September 27, 2015. http://blog.ouseful.info/2015/09/04/converting-spreadsheet-rows-to-text-based-summary-reports-using-openrefine/.
- “Diving into OpenRefineAnalyzing and Fixing Data -- Advanced Data Operations -- Linking Datasets.” n.d.
- Fitzpatrick, Scott. 2021. “Automating Data Preparation with Modern Tooling like Snorkel and OpenRefine.” ActiveState (blog). January 7, 2021. https://www.activestate.com/blog/automating-data-preparation-with-modern-tooling-like-snorkel-and-openrefine/.
- “Hooland et al. - 2013 - Cleaning Data with OpenRefine.Html.” n.d. Accessed November 13, 2019. https://programminghistorian.org/en/lessons/cleaning-data-with-openrefine.
- Hooland, Seth van, Ruben Verborgh, and Max De Wilde. 2013a. “Cleaning Data with OpenRefine.” Programming Historian, August. https://programminghistorian.org/en/lessons/cleaning-data-with-openrefine.
- ———. 2013b. “Cleaning Data with OpenRefine.” Programming Historian. August 5, 2013. http://programminghistorian.org/lessons/cleaning-data-with-openrefine.html.
- “"Open Source Includes Index.” n.d.
- “Parse and Remove HTML Tags Using Google Refine/OpenRefine & Jsoup/BeautifulSoup.” n.d. Stack Overflow. Accessed February 19, 2020. https://stackoverflow.com/questions/28402299/parse-and-remove-html-tags-using-google-refine-openrefine-jsoup-beautifulsoup.
- Reconcilliation in OpenRefine Part 2. n.d. Accessed January 13, 2020. https://www.youtube.com/watch?v=0tQPmfb6IFk&list=PL_0jeq3PjvtADzbovAgHNzOFvOlyF6uL1&index=3&t=0s.
- “Snapshot.” n.d. Accessed February 11, 2021a. https://www.activestate.com/blog/automating-data-preparation-with-modern-tooling-like-snorkel-and-openrefine/?task&utm_source=sendinblue&utm_campaign=OpenRefine_February_21_Newsletter&utm_medium=email.
- “———.” n.d. Accessed February 19, 2020b. https://stackoverflow.com/questions/28402299/parse-and-remove-html-tags-using-google-refine-openrefine-jsoup-beautifulsoup.
- “———.” n.d. Accessed February 19, 2020c. https://programminghistorian.org/en/lessons/fetch-and-parse-data-with-openrefine#example-1-fetching-and-parsing-html.
- “———.” n.d. Accessed September 27, 2015d. http://blog.ouseful.info/2015/09/04/converting-spreadsheet-rows-to-text-based-summary-reports-using-openrefine/.
- Subscribe. n.d. “Parsing Apache Log Using OpenRefine.” Accessed January 8, 2020a. http://kb.refinepro.com/2014/10/parse-apache-log-using-openrefine.html.
- ———. n.d. “(Part 1) Collecting Data from NationalBuilder API with OpenRefine.” Accessed January 8, 2020b. http://kb.refinepro.com/2018/09/collecting-data-from-nationalbuilder.html.
- ———. n.d. “(Part 2) Update Records in NationBuild API Using OpenRefine.” Accessed January 8, 2020c. http://kb.refinepro.com/2018/09/part-2-update-records-in-nationbuild.html.
- ———. n.d. “Prepare SQL SELECT, INSERT INTO, DELETE Query Using OpenRefine.” Accessed January 8, 2020d. http://kb.refinepro.com/2014/04/prepare-sql-query-using-openrefine.html.
- “Subscribe - Parsing Apache Log Using OpenRefine.Html.” n.d. Accessed January 8, 2020. http://kb.refinepro.com/2014/10/parse-apache-log-using-openrefine.html.
- “Subscribe - Prepare SQL SELECT, INSERT INTO, DELETE Query Usin.Html.” n.d. Accessed January 8, 2020. http://kb.refinepro.com/2014/04/prepare-sql-query-using-openrefine.html.
- “Using OpenRefine | PACKT Books.” n.d. Accessed February 24, 2018a. https://www.packtpub.com/big-data-and-business-intelligence/using-openrefine.
- ——. n.d. Accessed February 24, 2018b. https://www.packtpub.com/big-data-and-business-intelligence/using-openrefine.
- Verborgh, Ruben, and Max De Wilde. 2013. Using OpenRefine: The Essential OpenRefine Guide That Takes You from Data Analysis and Error Fixing to Linking Your Dataset to the Web. Community Experience Distilled. Birmingham: Packt Publ.
- Williamson, Evan Peter. 2017. “Fetching and Parsing Data from the Web with OpenRefine.” Programming Historian, August. https://programminghistorian.org/en/lessons/fetch-and-parse-data-with-openrefine.