Commons talk:Library back up project

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

How to upload 1927-1949 books? 如何上傳民國圖書?[edit]

I have obtained a large Chinese books database. People can't get these files online now so there is an urgent need to upload them. I have uploaded file published 95 year ago (1209-1899 1900-1910 1911-1920 1921-1926) because they have entered PD in US and highly likely author died over 50 year. How to upload books published 1927-1949? I can't verify one by one myself due to the large number. There are two ways:

  1. Publish the book list and request users identify PD books and copy to another list. I will upload the identified PD books. Pro: Good copyright protection. Con: Author information is difficult to find for the old authors. Only a small proportion is expected to be uploaded.
  2. Publish the book list and request users to delete non-PD books from the list. Then I will upload every book remain on the list. If non-PD books not removed by users were uploaded and found to be non-PD after the upload, then any user could report to delete them. Pro: A large proportion will be uploaded, which is good for the preservation of old books. Cons: It will increase the workload for admins to delete after the upload.

Which way should I upload them? Please comment.

我获得了许多中文书籍。人们现在无法在线获取这些文件,因此迫切需要上传它们。但是,有些人进入了公有领域(PD),而其他则没有。我应该如何上传它们?由于美国保护期为95年,而且这时期出版图书的绝大部分作者已经去世超过了50年,我已经上传了95年前出版的图书(1209-1899 1900-1910 1911-1920 1921-1926)。由于数量众多,我无法自己一一验证。对于1927-1949年出版的图书,由于数量众多,我无法自己一一验证,有两种上传方法:

  1. 发布图书列表,并请求用户识别 PD 图书并复制到另一个列表。 我将上传已识别的 PD 书籍。 优点:良好的版权保护。 缺点:老作者很难找到作者信息。 预计只有一小部分图书会被上传。
  2. 发布图书列表,并请求用户从列表中删除非 PD 图书。 然后上传每一本保留在列表中的图书。 如果非 PD 图书没有被用户删除,上传后发现是非 PD,则任何用户都可以举报删除。 优点:会上传很大一部分,有利于旧书的保存。 缺点:会增加管理员上传后删除的工作量。

请问哪个方案比较好?请留言。--Upload for Freedom (talk) 13:03, 5 November 2022 (UTC)Reply[reply]

Scheme 1[edit]

Scheme 2[edit]

  • Symbol support vote.svg Support This is a good balance between copyright protection and old book prevervation. The perspect of a global nuclear war is looming, so there is an urgent need to upload. --Upload for Freedom (talk) 03:00, 8 November 2022 (UTC)Reply[reply]

Comments[edit]

直帹丄傳絟蔀口巴,仮罡φ忟蝂權沒仒СаRЁ。--RZuo (talk) 07:42, 8 November 2022 (UTC)Reply[reply]

You mean 直接上传全部吧,反正中文版权没人care ? That's great. However, I might get blocked or have my bot status removed if I force to upload a large amount of non-PD books. I want to allow other users to remove non-PD books before and after my upload and I think that would be a good balance between copyright protection and old book prevervation. I want to have support over uploading in this way, please support Scheme 2. Upload for Freedom (talk) 11:46, 8 November 2022 (UTC)Reply[reply]
now i have an idea. if you have the authors' names, you could match them against wikidata, then query their date of death (P570). in any case this should be some automatic job instead of manually checking, when your list is too large. RZuo (talk) 14:48, 8 November 2022 (UTC)Reply[reply]
most authors won't be identified in this way Upload for Freedom (talk) 04:44, 9 November 2022 (UTC)Reply[reply]
Write a script to check against wikidata, worldcat.org and loc.gov for author's death date, given the title or author's name. --Happyseeu (talk) 16:41, 15 November 2022 (UTC)Reply[reply]

Proposal: Tolerate users to upload pre-1949 Chinese books 建议:容忍用户上传1949年和以前的中文书籍[edit]

Wikipedia is the world's most visited non-for-profit website. Wikimedia Commons is its companion website for hosting free media files. One aim is to maintain the files are free. But from a historical point of view, another important aim is too preserve the world's civilization, including old books.

There is a category of Chinese books are rare and need preservation. That is those published during the Kuomintang rule (1911-1949). They are considered non good thought by the new China so seldom reprinted so are often rare, and need preservation, especially considering prospect of the Mainland-Taiwan war. However, some books have not entered public domain. It would be too difficult to identify the vast amount of books. No one would be able to identify them and upload them. As a result, these books could disappear one day. What a loss! On the other hand, copyright laws must be obeyed. But it there a way to pursue both?

I propose to tolerate users to upload pre-1949 Chinese books with the following condition:

  1. After the upload, the uploader will positively identify non-PD books using author information from wikidata and put them to a list for deletion. These deletions should be done by an admin bot. They will be tagged with the year of restoration. They will be batch restored by admins when they enter PD.
  2. The uploader must publish a list of uploaded books including file name and author name information. Other users will be welcomed to identify and nominate the deletion of non-PD books.

This does NOT intend to change any Commons policy. It only indicates uploaders won't be punished for uploading these books.



维基百科是世界上访问量最大的非盈利网站。 Wikimedia Commons是其托管免费媒体文件的配套网站。一个目的是保持文件是自由的。但从历史的角度来看,另一个重要目的是保护世界文明,包括旧书。

有一类中文书籍很稀有,需要保存。那是在国民党统治时期(1911-1949)出版的那些。它们通常被新中国认为是不好的思想,很少再版,因此往往很少见,需要保存,特别是考虑到大陆与台湾可能的战争。但是,有些书籍尚未进入公共领域。识别大量书籍太难了。没有人能够识别它们并上传它们。没人从中国传播这些书的结果就是,这些书可能有一天会消失。真是个损失!另一方面,必须遵守版权法。但有办法兼顾两者吗?

我的目的是容忍用户上传 1949 年和以前的中文书籍,条件如下:

  1. 上传后,上传者会利用wikidata中的作者信息,积极识别非PD书籍,并将其放入列表进行删除。这些删除应由管理员机器人完成。它们将被按恢复日期标注标记。进入PD后,管理员会批量恢复。
  2. 上传者必须发布包含文件名和作者姓名信息的上传书籍列表。欢迎其他用户识别并提名删除非PD书籍。

这并不打算更改任何 Commons 政策。它仅表明上传者不会因上传这些书籍而受到惩罚。--Upload for Freedom (talk) 12:37, 12 November 2022 (UTC)Reply[reply]

Support[edit]

  • Symbol support vote.svg Support This is a good balance between copyright protection and old book preservation. The prospect of a Mainland-Taiwan war is looming, so there is an urgent need to upload.--Upload for Freedom (talk) 12:52, 12 November 2022 (UTC)Reply[reply]
  • Symbol support vote.svg Support I also frequently do that for books which may not be available elsewhere in the future. Yann (talk) 12:53, 13 November 2022 (UTC)Reply[reply]
  • Symbol support vote.svg Support I see no reason to object to this proposal, which, according to the proponent, is not a violation of copyright.--源義信 (talk) 11:12, 15 November 2022 (UTC)Reply[reply]
  • Support. Even though there is IA, I think it is valuable to do backup in Commons and possibly other places as well. I support this proposal as long as it does not contradicts the ToS.--虹易 (talk) 12:52, 15 November 2022 (UTC)Reply[reply]
  • Symbol support vote.svg Support Per 虹易's comment and as hinted by this zh wiki discussion, it'd be good if we can put in some contents as time capsule in space rideshare missions. Current worldly situation has never been worse ever. 2600:6C40:59F0:85F0:390F:3E3A:E100:669C 03:57, 17 November 2022 (UTC)Reply[reply]
    Thank you very much for your support. But our opinions are so close that you looks like my ip puppet... I don't know what to do.--Upload for Freedom (talk) 12:13, 17 November 2022 (UTC)Reply[reply]
  • Symbol support vote.svg Support, per my comments below. I simply don't see how using the "Undelete in 2XXX" feature is negative here, especially since the Wikimedia Commons isn't just built for people today but also the people of the future. See it like planting a tree of which you won't enjoy eating the fruits. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 12:41, 17 November 2022 (UTC)Reply[reply]
  • Symbol support vote.svg Support I am generally in favor of users doing what they want on Commons, as long as it is legal, for educational purposes, and doesn't force the Commons community to do massive amounts of work. This clearly fits the first two criteria, and as an entirely user-driven process it also fits the third. Zoozaz1 (talk) 23:23, 26 November 2022 (UTC)Reply[reply]
  • Symbol support vote.svg Support --Yinyue200 (talk) 07:16, 28 November 2022 (UTC)Reply[reply]

Oppose[edit]

  • Weak oppose, see my comments below. - Jmabel ! talk 17:46, 12 November 2022 (UTC)Reply[reply]
  • Oppose, per Jmabel--shizhao (talk) 11:43, 13 November 2022 (UTC)Reply[reply]
  • Symbol oppose vote.svg Oppose. Certain other non-commercial archives already have a good balance between copyright protection and old book preservation.--Jusjih (talk) 22:24, 13 November 2022 (UTC)Reply[reply]
  • Oppose, no reason to make any copyright exceptions when there are numbers of archival websites. The author data on Wikidata can be completed independent of changing policies on Commons. -Mys_721tx (talk) 16:06, 14 November 2022 (UTC)Reply[reply]
    This is not copyright exception. Wikidata's "death year" will be used. It is not related to policies here. Upload for Freedom (talk) 10:41, 15 November 2022 (UTC)Reply[reply]
  • Weak oppose, this requires a balanced manpower (discern) and success rate. Some content may come from copyrighted revision. --YFdyh000 (talk) 16:11, 14 November 2022 (UTC)Reply[reply]
  • Symbol oppose vote.svg Oppose Generally, still may have copyright issues, see notes from Zhang Zhongxin. Though certain cases may support. --Liuxinyu970226 (talk) 03:17, 23 November 2022 (UTC)Reply[reply]
  • Symbol oppose vote.svg Oppose Only anonymous book published before 1946 (in China) or 1952 (in Taiwan) or books written by authors died before 1946 (in China) or 1952 (in Taiwan) which copyrighted expired before URAA date should be allowed.--Billytanghh (talk) 20:54, 27 November 2022 (UTC)Reply[reply]

Comments[edit]

  • Why should Commons be the vehicle for this? It is going to take a lot of work, and not just by the uploader(s). This would seem like a more natural project for the Internet Archive.
  • If some subset of these works can be identified as now being in the public domain, I'm completely in favor of that being updated. The further in the future that the material will come into the public domain, the less I see Commons as the appropriate vehicle. - Jmabel ! talk 17:46, 12 November 2022 (UTC)Reply[reply]
    There is no better project than Wikimedia Commons. It is linked to Wikisource, where people can transcribe to text and Wikipedia, Where people can insert as illustration. Therefore, when a died old book is uploaded here, it becomes alive. Freedom gives the book life.
    没有比维基共享资源更好的计划。它与维基文库相连,人们可以将文本转录。它与维基百科相连,人们可以加入插图。因此,当一个死去的老书传到了这里,它就活了!自由给予了书籍生命。--Upload for Freedom (talk) 12:09, 15 November 2022 (UTC)Reply[reply]
This is an excellent point. Wikisource is a valuable repository of content as text can be easily cut and pasted for quotation w/o fear of copyright infringement. I'm in favor of a solution that would help expand the content of Wikisource. --Happyseeu (talk) 16:20, 15 November 2022 (UTC)Reply[reply]
  • I prosposed this entirely for the public interest. I won't gain anything personality from this. 我完全是为了公众利益提出此案,我完全不会获得私利。--Upload for Freedom (talk) 12:16, 15 November 2022 (UTC)Reply[reply]
  • I am doubting whether there is any guarantee that deleted files could be restored in the long term, or if they are under the same storage/backup level as published files, considering the proposal is for the far future. And, if the proposal is actually accepted and such actions are allowed, such files may be listed in Category:Undeletion_requests. --虹易 (talk) 12:52, 15 November 2022 (UTC)Reply[reply]
  • I think there are some problems:
    • First, Commons is not a bill-free storage space. Using deletion to hide save files, I don't know how the foundation will comment, but they are very dissatisfied with some people uploading pirated movies to the foundation's server, and then using the Wikipedia Zero project to achieve free Internet fee playback;
    • Second, you'll need to enlist the assistance of an administrator to delete and then restore the file. Based on your record of public communications, it doesn't appear to have been successful.

--Cwek (talk) 00:57, 16 November 2022 (UTC)Reply[reply]

Thank you for you comment. I am not proposing to upload any movies, it's just books. They are tiny in file size compared with the vast amount of pictures and videos uploaded to the site every day. I have asked some admins to rapid delete some recently published books uploaded by mistake. It only needs to add a template to the deleted file page so you cannot see them. Upload for Freedom (talk) 06:14, 16 November 2022 (UTC)Reply[reply]
"You asked", but did they promise? The plan seemed complex and relied on trick. All uploaders need to know this plan, upload files according to this plan and mark them; all administrators need to know this plan, and after deleting files, they need to add a mark to a page where the file has been deleted and does not exist; These files should be restored periodically. The plan looks pretty, but I don't look good. At least some admins and power users have been advised not to upload files to Commons like this, Or it should have a better place to save them. However, I think, whether the foundation considers this problem, it can set up a main data center that does not store data in the United States to store these documents or documents that do not meet the copyright requirements of the United States, so as to avoid this problem. These issues are more likely to be recommended to the Foundation, and less likely to lead to actual action here. --Cwek (talk) 03:33, 17 November 2022 (UTC)Reply[reply]
  • Pictogram voting comment.svg Comment, as of writing this "Category:Undelete in 2023" contains 239 (two-hundred-and-thirty-nine) pages and a sub-category, we have "Undelete in 2XXX" pages going on for over a century. Honestly, I think that if user "Upload for Freedom" had just contacted user "Yann" directly and said "Hey, I want to upload these books, can you speedy delete them for me and tag them for undeletion later?" That this entire project would have been successful without so many nay-sayers, simply because I assume that because we're all volunteers here that some volunteers feel like they would be wasting their free time on deleting and then undeleting files, despite the fact that there are users more than willing to do that. In fact, admins generally do what they want because of vague policies, for example deleting a page of a "non-contributor" with the justification "deleted page File:Profile pic for self.png (Personal photo by non-contributors (F10)) " despite this "non-contributor" having 20,405 global edits as of writing this, no wonder lots of users disengage from the system without appealing. Actual abusers won't stop their abuses but good faith users who feel the blunt and have a lot to offer end up leaving. Regarding the comment "there are some problems: **First, Commons is not a bill-free storage space. Using deletion to hide save files, I don't know how the foundation will comment, but they are very dissatisfied with some people uploading pirated movies to the foundation's server, and then using the Wikipedia Zero project to achieve free Internet fee playback" This was actually done by a ring of users seeking to use the Wikimedia Foundation's servers to stream films illegally, but every deleted file still stays on the Wikimedia Foundation's servers, this is why the "Undelete" option is even possible. Neither server space nor potential server abuse are a problem here.
Uploading a file to be undeleted later isn't a bad thing because the Wikimedia Commons isn't a short-term project designed exclusively for people who need it right now, the future undeletion system recognises the educational value of storing in-copyright files indefinitely but "invisibly" for future generations. The idea that "These files are better hosted at the Internet Archive" is flawed, in fact user "" started an entire import campaign to import books from the Internet Archive because their Library was in jeopardy over legal issues. I simply don't see why this project is controversial as we host thousands of files that are currently planned to be undeleted later and probably tens of thousands of files that will be undeleted at some point in the future. If volunteers are willing to invest their time into making this project work then I don't see why people who don't want to "waste their time" on this are wasting their time arguing against it. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 12:39, 17 November 2022 (UTC)Reply[reply]
It appears that the Commons has always had this practice of deleting files pending entry into the public domain with categery mark until the time is up for reinstatement. This practice is not complicated, it just need to apply for deletion immediately after uploading and explain that it needs to be restored when it expires, and the administrator should know this practice and deal with it accordingly. If this practice is always feasible, I don't think it is necessary to find another way, but follow the existing practice. --Cwek (talk) 00:41, 18 November 2022 (UTC)Reply[reply]
  • There may be a technical issue with undeletion as proposed. Once file A is deleted, the file name can be usurped by file B. Fast forward to the future undeletion of file A: how is file A going to be restored if its name has already been taken? You will need to make sure that the file name is and will remain unique. --HyperGaruda (talk) 06:11, 20 November 2022 (UTC)Reply[reply]
    Good question. The file title should be source abbreviation + Source ID+book name as suggested by the project page. If there is file with identical name being uploaded, it can only be the book it self, except for vandalism. Upload for Freedom (talk) 03:20, 21 November 2022 (UTC)Reply[reply]
    Once undeleted, one of the file will need to be renamed. It requires some works if there are many of them, but that's not really a problem. Yann (talk) 17:15, 21 November 2022 (UTC)Reply[reply]
  • Would you give me a book list?I am more interested in what books you have. A rough draft is fine.Ghrenghren (talk) 15:38, 21 November 2022 (UTC)Reply[reply]

Books uploaded! Do you want more?[edit]

@Yann, 源義信, 虹易, 2600:6C40:59F0:85F0:390F:3E3A:E100:669C, Donald Trung, Zoozaz1, and Yinyue200: Thank you for your support! I have uploaded 0.2 million files with SSID. I have filtered files for uploading using the 3 criteria:

  • All books published before 1950.
  • Books without date in metadata. I have filtered those likely to be old books.
  • Books published in 1950 and later. I have filtered those likely to be reprints of old books. Plus old news by Xinhua News agency.

There are many more books that did not match the above criteria but in public domain. I have uploaded the entire books list here. The table contains uploading status. (已上传 and 未上传)

https://easyupload.io/cvyppu

If you want seeing more books uploaded, please put the SSID of the book (the first column in the table) here: User:Upload for Freedom/SSID. I will upload the books. NOTE: THE BOOKS COULD DISAPPEAR ANY TIME. I WILL ALSO REMOVE MY UPLOADING ENVIRONMENT SOON. I CAN ONLY RESPOND REQUESTS 20 DAYS FROM NOW.--Upload for Freedom (talk) 02:25, 10 January 2023 (UTC)Reply[reply]

After that, I will filter author who died within 50 years based on wikidata for deletion and put a list of the books on Wikimedia Commons.--Upload for Freedom (talk) 02:19, 10 January 2023 (UTC)Reply[reply]

@Upload for Freedom: Hi, Thanks for the update. IMO there should be the date in the description, i.e. File:CADAL02079034 明史(一).djvu. Do you plan to upload all books listed in Commons:Library back up project/file list/NLC/民國圖書/01 (and other pages)? And in File:SSID-10000424 使藏紀程.pdf, we need either the original publication date, or the date(s) for the creator (best is to use the Creator templates). Regards, Yann (talk) 10:01, 10 January 2023 (UTC)Reply[reply]
Hi, Yann. NLC books are uploaded by the user 虹易. I hope he/she will upload those books as well. Meta data for the year of creation is unavailable for some books. Some of them are ancient and some are new. I have manually identified those likely to be old (in chunks) and uploaded them. It was dull! I will make a list containing author information from wikidata.--Upload for Freedom (talk) 13:12, 10 January 2023 (UTC)Reply[reply]
@虹易 and Upload for Freedom: For File:CADAL02079034 明史(一).djvu, the date is in Category:明史, but it should be in the description as well. Yann (talk) 15:10, 10 January 2023 (UTC)Reply[reply]