Sitemap Domain Links internet-howto

Petitions:WhiteHouse.gov Petition Search:


Posted on Wednesday – 11/30/2016 – 15:59:45







Return To Sitemap Domain Crawler Home-Page:  “Click Here”
Sitemap Domain Index of Multi-site:  “Click For Multi-Domains Link”



1.)  Title: We The People Demand Martial Law To Secure Our Homeland, Secure Our U.S. Borders From Mafia/Gangland Domestic Terrorism.

1a.) Petition: Please conscript as mandatory all law-enforcement into privatized and federal martial law indefinite to restore public safety and order, to secure our homeland, and U.S. borders, to combat ODD/PTSD civil disobedience criminal organized elements, and manhunt international terrorist in U.S. and abroad, to manhunt mafia/gangland domestic terrorists as armed and dangerous enemy combatants, and upon capture arrest to temporary revoke all U.S. citizenship after judicial conviction, to detain as mil.spec U.C.M.J. prisoners of war property, to indoctrinate into mil.spec academic educational detention-center labor work force nationwide, To make mandatory all tattoos as gang-related or defined otherwise shall be 100% removed during penalization time served unconditional or death sentence mandatory.



2.)  Title: We The People Demand Declarations To Purge Terrorists, Domestic Terrorists Into Economic Independent Detention Centers.

 ——- ——- ——-
2a.) Petition
: “Please conscript as mandatory all law-enforcement into privatized and federal martial law indefinite to restore public safety and order, to secure our homeland, and U.S. borders, to combat ODD/PTSD civil disobedience criminal organized elements, and manhunt international terrorist in U.S. and abroad, to manhunt mafia/gangland domestic terrorists as armed and dangerous enemy combatants, and upon capture arrest to temporary revoke all U.S. citizenship after judicial conviction, to detain as mil.spec U.C.M.J. prisoners of war property, to indoctrinate into mil.spec academic educational detention-center labor work force nationwide, To make mandatory all tattoos as gang-related or defined otherwise shall be 100% removed during penalization time served unconditional or death sentence mandatory.”



3.)  Title: We The People Demand Declaration To Indefinitely Remove of Children From All Abusive Legal Guardians Nationwide.
—- —- —-
3a.) Petition: Please impose declaration into law the indefinite removal of children from all legal guardians nationwide receiving subsidized income of any social services, (regardless), and or ingesting substance as abusers by blood, hair-follicle, bone marrow testing, will be defined all non-compliant as accused sexual predators. Included all above, whereby, all children will be remove from guardian indefinite until gainful economic employment and or removed from subsidized income and or moves abroad, and or revoked U.S. citizenship indefinite and only upon these documents finalized by legal requirements said children will be returned immediately or upon child reaches 18 y/o adulthood of own volition. As above defined restrictions is same pre-requirements for all citizens living abroad internationally.



4.)  Title: Mandatory Law of a nationwide academia focused on basics, civil-service objective, U.C.M.J. mil.spec ranking Structures.
—- —- —-
4a.)  Petition: Please make declaration the separation of all non-adults children from legal guardians into private civil service based mil.spec ranking academia structure boarding schools and charter-schools for all professional/trade levels to higher academia degrees, to make mandatory basic education first and technical and or trade next. To Indoctrinate all levels of academic to practice at the age of six years old learning independence, and definitions self-reliance as citizens, in work forces, service to society labor workshops based on mil.spec mental/physical performance ranking and social engineering skills. Starting at 6-y/o each Kindergarten, Freshman, Sophomore, Junior, and Senior levels students must restart their gained ranking structure based on their academic achievements levels per-phase.



5.)  Title: We The People Demand Reform all U.S. Law Enforcement Into Federalize-to-Privatized Mil.Spec Academic-Ranking Structures.
—- —- —-
5a.) Petition: We The People Demand the conscript all levels of U.S.-Law-Enforcement into international Federalization-Privatization U.C.M.J. Mil.Spec ranking and discipline structured with one year-rotation per duty-station not within 2k miles place-of-birth, 2-4 year commitment for all U.S.-civilians, all civil service and military is housing military on base only, and working along-side with all levels law-enforcement, to guard public, corporate, academic, government, military facilities. Federal and civil-service policing is with same basic and or professional academia training for all citizens, service includes correctional detention centers and all non-compliant delinquent to rule of this law will serve as 1st., prison sentences as mandatory 2yrs or more and 2nd., 2yrs correctional guard duty.



With our free online XML Sitemap Domain generator you can very easy create your Sitemap Domain. First type in your URL and then select the parameters you may wish to change. (change frequency, last modification date and page priority. You may also alter default settings for exclude extensions, do not parse extensions and session ids. In the next fields you can declare which URLs you want to exclude from Sitemap Domain (see example below). Finally you may select the maximum number of pages and the depth level. Optionally you can select to create additional Sitemap Domain, like ROR Sitemap Domain, HTML Sitemap Domain or TXT Sitemap Domain.



Sitemap Domain

http://www.yo-mama-is-a-geek.com/ – 2016-08-01 04:53:30 09
http://www.venus-vs-mars.com/ – 2016-08-01 04:53:46
http://www.internet-how-to.com/ – 2016-08-01 04:57:27
http://www.fbi-search.com/ – 2016-08-01 05:00:04
http://www.anomalous.internet-how-to.com/ – 2016-08-01 05:02:44
http://www.canada.yo-mama-is-a-geek.com/ – 2016-10-11 22:51:58
http://www.usa.yo-mama-is-a-geek.com/ – 2016-10-11 22:52:13



Whitehouse Petitions:
1.
2.
3.
4.
5.
6.


 


(adsbygoogle = window.adsbygoogle || []).push({});

 


What is “Page changing frequency”?
Change frequency affects when and how often search engine spiders visit your site’s pages. It may have one of seven values: always, hourly, daily, weekly, monthly, yearly, never. This tells the search engines how often each page is updated. An update refers to actual changes to the HTML code or text of the page.

What is “Last modification date”?
This parameter can take one of the next three values:
Server’s response. Set the date of last modification of the file using server response headers. This value, gives crawlers the information to not recrawling documents that have not changed. We recommend to keep this setting.
Current time. Set the date of last modification of the file using the current date and time.
None. Do not use any value for Last modification of the files.

What is “Page priority”?
The Priority is set to a number between zero and one. If no number is assigned, priority is set to 0.5. This number determines the priority of a particular URL relative to other pages on the same site. A high priority page may be indexed more often and appear above other pages from the same site in search results. Automatic priority reduces the priority of a page depending on depth level.

What is “Depth Level “?
Depth level of a page means how many clicks away is this page from homepage.

What is “Exclude extensions “?
Files with these extensions found in your website pages are not included in Sitemap Domain (not crawled). Separate input values with spaces.

What is “Do not parse extensions “?
Files with these extensions will not be fetched in order to save bandwidth, because they are not html files and have no embedded links but will be included in the Sitemap Domain. Separate input values with spaces.

What is “Session IDs “?
If URLs on your site have session IDs in them, you must remove them. Including session IDs in URLs may result in incomplete and redundant crawling of your site. Common session IDs: PHPSESSID, sid, osCsid. Separate with spaces.

What is “Exclude URLs “?
URLs that contain these strings will not be included on Sitemap Domain. Input values one per line.
e.g. 1 Use string: component/ in order to exlude all pages in www.yoursite.com/component/
e.g. 2 If you have any of the following websites, you may exclude these strings: (copy and paste to Exlude URLs box)



XML Sitemap Domain via wget & shell script


 Posted on: Wednesday – 11/30/2016 – 15:59:45



(adsbygoogle = window.adsbygoogle || []).push({});


A Sitemap Domain or Sitemap Domain is a file that lists the pages of a webpage that are accessible to users and search engines. This can be in any format as long as whoever is reading it can understand the format. There are mainly two types of format that are used when creating a site map: XML and HTML.

All websites should ideally have at least one form of Sitemap Domain, especially for search engine optimization (SEO). An XML Sitemap Domain is usually preferred for SEO as it contains a lot of relevant metadata information for the URLs. Almost all search engines have the capability to read a properly formatted sitemap, which is then used to index the pages on the website.

If you create websites using any of the website building framework such as WordPress or Drupal, then there is already a built-in functionality (or a plugin) that can help you to automatically generate relevant Sitemap Domains. But if you are developing websites using other web technologies such as HTML, CSS, Javascript or PHP without the aid of a website building software or platform, then you will need to create Sitemap Domain manually.

If and when you do have to create Sitemap Domain manually, it is often not that bad if it is a small website with just a few pages. If the website has even as few as 30 or 40 pages, then it becomes a nightmare to create the XML Sitemap Domain by hand. It is also a on-going maintenance issue if the website has constant updates where new pages are regularly added along with pages being deleted. It is quite easy to make silly errors including spelling mistakes or to miss pages.


We will try to create or develop a simple shell script that can crawl the website and generate a simple workable XML Sitemap Domain. That would make it very easy to regularly generate Sitemap Domains. We will use the wget utility in Linux to crawl the website.

I will take a step by step approach so that you can learn and better understand how the script is created. If it is not of much interest to you then just scroll to the bottom to get the complete script.

We will assume that your website is running locally on localhost, which is usually the case if you are in the process of developing the site (not always though). We will crawl the website first using the simplest of  wget command.

$ wget http://localhost/mywebsite/


Now, we do not really care about saving the content of the webpages locally. Also, we need to recurse into the website hierarchy rather than just the home page. Let’s add the recursive and the spider option to wget.

$ wget --spider --recursive http://localhost/mywebsite/


Now, wget by default only crawls to a depth of 5. We want to crawl the entire website no matter what the depth. We will set the depth to infinity. You can modify this to the depth level you want.

$wget --spider --recursive --level=inf http://localhost/mywebsite/


We will store the output to a local file, which we will be able to manipulate later. Also, we will use the –no-verbose option to reduce the logs. We just need the URL of the page that it is downloading, and nothing else. Keeping it small will make it easier to parse it. So, the command now looks like this:

$wget --spider --recursive --level=infinity --no-verbose --output-file=/home/tom/temp/linklist.txt http://localhost/mywebsite/



(adsbygoogle = window.adsbygoogle || []).push({});


Now, this file linklist.txt contains all the URLs on the website, however not in the exact format we want. We will strip out only the URL part of this log file using grep and awk.

The log messages in the output file should be something like what is in the example below. The URL that we are interested is the text just after the word URL: up till the next blank or white space character. So, now we will try to extract that text using awk. We will pipe several awk commands in steps to strip out exactly what we want from the lines.

2015-08-07 15:21:59 URL:http://localhost/mywebsite/keynotes/feed/ [769] -> "localhost/mywebsite/keynotes/feed/" [1]


First, we get all the lines that we want from the file..

$ grep -i URL /home/tom/temp/linklist.txt


Now, we will split the line and strip out the part after the string URL: from the line using awk. That should be simple enough:

awk -F 'URL:' '{print $2}'


Now, we can trim out the spaces from the line, as it might contain some leading spaces.

awk '{$1=$1};1'


The next step is to strip out just the url, which is a first part of the string up till the first white space.

awk '{print $1}'


You can probably combine all of the above into a single awk command, but this keeps it easy enough to understand. Now, we can sort the URLs and then remove the duplicates suing the sort utility. We will also remove any blank lines using sed.

sort -u | sed '/^$/d'


Putting it all together, the entire command will look something like this

grep -i URL /home/tom/temp/linklist.txt | awk -F 'URL:' '{print $2}' | awk '{$1=$1};1' | awk '{print $1}' | sort -u | sed '/^$/d' > sortedlinks.txt


There are several other options to do this exact same thing using sed or even tr. But I think the above set is simple and modular enough so that you can customize it further to match your requirements. You could replace the domain name of the URL if needed with a simple sed command.

The next step is to generate the sitemap XML file. We will generate the site map from a boilerplate template….with preset values. We will just create a very simple sitemap suitable enough for simple static websites. We are not going to add any extra fields, such as with more sophisticated systems or deal with images.

First we will loop through the links in the file, and insert a url tag for each of the URLs we want. We will look at only URL that ends either with a slash (/), or the extensions html or htm. We will add just one xml tag for location for each of these urls.

Here is the entire bash shell script for the process.

#!/bin/bash
sitedomain=http://www.mywebsitedomain.com/
wget --spider --recursive --level=inf --no-verbose --output-file=/home/tom/temp/linklist.txt $sitedomain
grep -i URL /home/tom/temp/linklist.txt | awk -F 'URL:' '{print $2}' | awk '{$1=$1};1' | awk '{print $1}' | sort -u | sed '/^$/d' > /home/tom/temp/sortedurls.txt
header='<?xml version="1.0" encoding="UTF-8"?><urlset
      xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
            https://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">' 
echo $header > sitemap.xml
while read p; do
  case "$p" in 
  */ | *.html | *.htm)
    echo '<url><loc>'$p'</loc></url>' >> sitemap.xml
    ;;  
  *)
    ;;
 esac
done < /home/tom/temp/sortedurls.txt
echo "</urlset>" >> sitemap.xml


You can add additional fields such last-modified, changefreq or priority as needed. I have kept it simple for most part. You requirements will vary and you can adapt and develop this script further to add more fields.

While this is a good method to generate sitemap for local websites, there are definitely other methods to generate xml sitemap like online sitemap generators.

A note of caution: It is actually not a good idea to create XML files directly from shell scripts, especially if it is complex and large. You might be better off finding and using a perl or python library to create more sophisticated XML sitemap files. You can also use the intermediate files generated by this as your input…123




(adsbygoogle = window.adsbygoogle || []).push({});



Leave a Reply

Your email address will not be published.