From 771a32a870859376d013f343d3d74cf4d4d3fa09 Mon Sep 17 00:00:00 2001 From: Lucky <66523959+l-ucky@users.noreply.github.com> Date: Tue, 29 Aug 2023 17:37:26 -0300 Subject: [PATCH] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 92190af..6c18908 100644 --- a/README.md +++ b/README.md @@ -17,6 +17,7 @@ Consider doing your own data analysis. If you save your CSV, and make a pull req **Limitations** - 4chan will recycle poster IDs, so they are not unique identifiers. Therefore, the data mining on n-pbtid isn't fully accurate for the upper-bound, but it should be closer to representative at the lower-bound. - Time of day scrapes based on time-zone of interest (e.g. New York posting hours) hasn't been implemented, but this can be easily solved by scraping the threads from 0900h - 1700h by intervals of 3 hours to allow old threads to die, for your target local time. +- Some synonyms will not be counted under a single word (e.g. glowie, glow, glows, glower), so manual input will need to be implemented in the v2 script. The same idea goes for plural words, but there is another package that will take root words, and remove word modifiers (e.g. -ing, -ed, -s, etc) # html_text vs html_text2 from rvest